Videogrep is a python script that searches through dialog in videos and then cuts together a new video based on what it finds. Basically, it’s a command-line “supercut” generator. The code is here on github.
The script searches through a video’s associated subtitle file (which needs to be in the same folder as the video, in standard .srt format), identifies timestamps for the dialog, and then uses the wonderful moviepy library to generate the new final cut.
Here’s one of the results: every instance of a character saying the word “time” in the movie In Time (a film whose dialog appears to consist mostly of clock-related puns).
The script also works with multiple video files in the same directory. As an experiment, I mass downloaded press briefings from the Whitehouse youtube channel, then ran a word-level n-gram analysis on the subtitle tracks to find some commonly used phrases. I discovered that the phrase “what I can tell you” is occurs pretty frequently.
So, here is Jay Carney, the former Press Secretary, telling us what he can tell us. Note the necktie transitions.
You may have noticed some strange cuts in the above video. The accuracy of the edits is completely reliant on the accuracy of the subtitle tracks.
You can also use videogrep to find instances of people employing specific grammatical structures (I do this with the pattern library). For example, by running the following command
python videogrep.py --input terrible_ted_talks/ --search '^VBG DT JJ NN' --search_type pos
I end up with a video of TED speakers saying “[gerund] [determiner] [adjective] [noun]”
As I final experiment, I wrote a complimentary script that finds dialog-free sections of videos. Here, for example, is “Total Silence”: all the one to two second silences in the movie Total Recall.
Feel free to mess around with the script on github, and let me know if you have any suggestions for source material to run through it.