A few weeks ago, I was watching Lost, and I thought I heard the TARDIS showing up in the show. (At least, I thought it would’ve been awesome if it did. XD ) At the end of a particular scene, someone said something particularly dramatic, and then there was a “whoooooOOOSH” and the scene changed. That sound happened several times more during the episode, and despite all the “It’s the TARDIS! The TARDIS is everywhere!” entertainment I was having, I admitted in my heart of hearts that this was just Lost’s way of making the dramatic twists sink in.
Pretty soon, I noticed the same kind of thing happening in all kinds of shows, including Chuck and (I think) CSI. There seems to be a tendency toward a kind of rising hum or twang. Yet, that’s not the only option; Boston Legal’s score does the same thing, but it’s composed of jazzy vocals and stuff.
So, you’ve got rising action, a twist, an obligatory sound effect, and a quick scene change. I guess it’s a formula.
So why do I bring it up? I’ll get to that in a second. (Er, make that several seconds. -Future Me)
I’ve been looking over audio analysis techniques, wrapping my mind around how exactly I might go about implementing (or finding someone else’s implementation of) an FFT or DWT, getting to the point where I can understand this abstract, and I notice something there. The writers of that used something called MARSYAS. What’s that?
Well, Marsyas is an open-source C++ project that apparently is exactly as ambitious as MVTron would like to be in exactly the same ways. Although its main focus is music and other audio, apparently there’s MarsyasX, a branch or something, which is a reimagining of Marsyas to be more video-inclusive. Considering Marsyas’s seeming focus on feature extraction, similarity detection, and… well, all kinds of stuff, it seems like MVTron would be just another application in the sea over there. That takes a lot of (self-inflicted) pressure off of me as a lone programmer. :-p
It almost looks at this point like MVTron will end up being a project entirely submerged in Marsyas, maybe even to the point that it’s written in C++… but I guess I shouldn’t be so hasty. I’ve only just heard of Marsyas, and besides, there’s an entry on their ideas page calling for “Porting the Marsyas dataflow architecture to Java.” ^_-
I ported the scene-detecting part of my AviSynth script to Groovy (using the AviSynth wrapper I was talking about yesterday), and it was still much slower than I’d have liked it to be. It paused every once in a while, probably to run a garbage collection pass or to find more space to allocate frames in. In the hopes that I could avoid this problem, I spent a few days to rewrite the library to be a three-layered system that was operable first, memory-safe second, and easy-to-use third.
It had previously been a two-layered system that was a low-level library first and a high-level library second. The middle layer is what was new here, but its arrival pretty much forced a complete refactoring of the two layers that surrounded it.
In any case, I think this was an important step to take, but it didn’t help the speed at all. So, I tried a few optimizations of the ported scene-detection script itself. First of all, I took away the kludge I was using to represent the boolean either-this-is-the-start-of-a-scene-or-it-isn’t stream. Rather than representing true and false with white frames and black frames, as I was forced to do in AviSynth, I represented them with, well, Groovy’s true and false. Instead of having the function return a Clip, I had it return a Closure (which itself would take an int and return a boolean). This did the trick. I took away six of the filters I was using in the script, and the speed improved markedly.
Once I put in some caching for the intermediate frames, the speed improved again by about 30%. Finally, I thought I’d push the limits of Groovy optimization a bit more by implementing it in Java. There was practically no improvement. Oh, well.
In the end, it tends to take about twice as long to process the scenes as it takes to actually play the movie. That’s still somewhat dismal, but I think it’s good enough for now. Maybe once I have a complete working prototype the speed improvements will follow.