Videogrep is a python script that searches through dialog in videos and then cuts together a new video based on what it finds. Basically, it’s a command-line “supercut” generator. The code is here on github.
The script searches through a video’s associated subtitle file (which needs to be in the same folder as the video, in standard .srt format), identifies timestamps for the dialog, and then uses the wonderful moviepy library to generate the new final cut.
Here’s one of the results: every instance of a character saying the word “time” in the movie In Time (a film whose dialog appears to consist mostly of clock-related puns).
The script also works with multiple video files in the same directory. As an experiment, I mass downloaded press briefings from the Whitehouse youtube channel, then ran a word-level n-gram analysis on the subtitle tracks to find some commonly used phrases. I discovered that the phrase “what I can tell you” is occurs pretty frequently.
So, here is Jay Carney, the former Press Secretary, telling us what he can tell us. Note the necktie transitions.
You may have noticed some strange cuts in the above video. The accuracy of the edits is completely reliant on the accuracy of the subtitle tracks.
You can also use videogrep to find instances of people employing specific grammatical structures (I do this with the pattern library). For example, by running the following command
1 |
python videogrep.py --input terrible_ted_talks/ --search '^VBG DT JJ NN' --search_type pos |
I end up with a video of TED speakers saying “[gerund] [determiner] [adjective] [noun]”
As I final experiment, I wrote a complimentary script that finds dialog-free sections of videos. Here, for example, is “Total Silence”: all the one to two second silences in the movie Total Recall.
Feel free to mess around with the script on github, and let me know if you have any suggestions for source material to run through it.
As the editor of Slacktory, a YouTube channel that’s run about over sixty supercuts, I thank you with all my chopped and screwed heart. And I have a suggestion: The Toast editor Mallory Ortberg recently tweeted a request for “a supercut of every time someone says “…Dead?” on a Law & Order show.” It sounds doable, depending on how captioners transcribe elliptical pauses.
You’re very welcome. Also, totally doable.
This was sent to me by a good friend, and of course nick was the first response.
That was the most awkward cut of Total Recall I have ever seen. Awesome work!
This is amazing! No reason why it won’t work with non-English (UTF-8) text, right?
Yeah the basic search should work in any language.
Looks like a very cool tool. How did you do the N-gram analysis? I googled and only came up with Google’s N-Gram viewer.
I included a tool that does this in the github repo: https://github.com/antiboredom/videogrep/blob/master/tools/ngrams.py
Please include an example usage for the n-grams script, the source isn’t entirely verbose. the first argument is the input file, but as someone who is inexperienced with n-gram creation I haven’t a clue what the variables pulled in from the command arguments ‘total’ and ‘threshold’ represent.
Thanks
This is awesome!!!!
Really mind blowing. I love it. Hope it works for other language. Right?
Would you share more about your experience using the pattern library, clips? How did you find it? Are there other packages/libs in this space? First time hearing about it. I have a particular corpus that is riddled with acronyms and partial sentences. Wonder how it would do…
great work. congratz!
Hi,
This sounds stupid but, Could you insert the text you want to cut and set it so that it cuts the piece of film from the previous full stop to the end of the sentence containing your “phrase”.
I don’t have any idea what I’d use this for, but I like it.
Regards,
Haaggis.
Wow, very good idea!
Here, all the “Making the world a better place” in HBO’s Silicon Valley https://vimeo.com/98720197
Awesome!!I had actually been toying with the idea of making a videosed myself (for like, hey, changing the name of the star of your favorite movies with yours!), but it would need some pretty decent speech recognition support which has put me off pursuing the idea.
I’m very into making software to fuck with movies, I did this very simple (still needs A LOT OF WORK) one that re-edits movies randomly
https://github.com/takosuke/RandomEditor
This is so cool. I’ve been playing with it here, but have started running into an error that is beyond me to solve. So I’ve pasted the trackback here, in case it makes more sense to you ;)
Traceback (most recent call last):
File “videogrep.py”, line 200, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 182, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 87, in create_supercut
video = concatenate(clips)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
w = max([r[0] for r in sizes])
ValueError: max() arg is an empty sequence
….?
Getting the same problem.
Hi Ben – I made sure that my .srt files and my .mp4 files had the same name (for some reason, names were wonky). Getting a new error now though:
Writing audio in supercut.mp4.tmp20TEMP_MPY_to_videofile_SOUND.ogg
|———-| 0/800 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
File “videogrep.py”, line 200, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 180, in videogrep
create_supercut_in_batches(composition, outputfile, padding)
File “videogrep.py”, line 115, in create_supercut_in_batches
video = concatenate(clips)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
w = max([r[0] for r in sizes])
ValueError: max() arg is an empty sequence
…’a little knowledge is a dangerous thing’ applies to me and my skillz, I’m sure…
Hey guys – sorry it’s not working. I’m not sure what the issue is but I will look into it.
Cheers, Sam!
Nice job, Sam! I’m eager to use this too.
FWIW, I’m also getting the emtpy sequence error. moviepy.audio.io.readers.skip_chunk raises an IOError 9 Bad File Descriptor at ‘self.proc.stdout.flush()’. The exception comes from create_supercut, but is hidden by the bare try/except block in create_supercut_in_batches.
This was trying videogrep.py with a bunch of mp4’s in the input directory.
Removing that call and also the one at moviepy.video.io.ffmpeg_reader line 79 lets the process continue, until I got this one:
Writing video into TEMP_MPY_to_videofile
|———-| 0/19841 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
File “videogrep.py”, line 262, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize, args.sync)
File “videogrep.py”, line 240, in videogrep
create_supercut_in_batches(composition, outputfile, padding)
File “videogrep.py”, line 117, in create_supercut_in_batches
video.to_videofile(outputfile)
File “/usr/local/lib/python2.7/site-packages/moviepy/video/VideoClip.py”, line 281, in to_videofile
verbose=verbose)
File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 143, in ffmpeg_write_video
writer.write_frame(frame.astype(“uint8”))
File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 108, in write_frame
self.proc.stdin.write(img_array.tostring())
IOError: [Errno 32] Broken pipe
ffmpeg_writer was sending errors to /dev/null. Enabling the logfile showed this ffmpeg error in my case:
ffmpeg version 2.2.3 Copyright (c) 2000-2014 the FFmpeg developers
built on Jun 20 2014 20:09:36 with Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
configuration: –prefix=/usr/local/Cellar/ffmpeg/2.2.3 –enable-shared –enable-pthreads –enable-gpl –enable-version3 –enable-nonfree –enable-hardcoded-tables –enable-avresample –enable-vda –cc=clang –host-cflags= –host-ldflags= –enable-libx264 –enable-libfaac –enable-libmp3lame –enable-libxvid –enable-libvorbis –enable-libvpx
libavutil 52. 66.100 / 52. 66.100
libavcodec 55. 52.102 / 55. 52.102
libavformat 55. 33.100 / 55. 33.100
libavdevice 55. 10.100 / 55. 10.100
libavfilter 4. 2.100 / 4. 2.100
libavresample 1. 2. 0 / 1. 2. 0
libswscale 2. 5.102 / 2. 5.102
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
Input #0, rawvideo, from ‘pipe:’:
Duration: N/A, start: 0.000000, bitrate: 530841 kb/s
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1280×720, 530841 kb/s, 24 tbr, 24 tbn, 24 tbc
[NULL @ 0x7fb03203d000] Unable to find a suitable output format for ‘TEMP_MPY_to_videofile’
TEMP_MPY_to_videofile: Invalid argument
Conversion failed!
So my (silly) problem in this case was that I was setting just an output directory, not a filename, as ffmpeg expected a file extension to be present. Setting it to something like “–output ./out/video.mp4” made it work.
I’m using OSX 10.7.5, Python 2.7.5
Interesting, thanks for the reply dude.
I will have a play with the naming and see if I have any similar results.
Same here :/
OSX Mavericks 10.9.3
Creating clips.
Traceback (most recent call last):
File “videogrep.py”, line 191, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 173, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 88, in create_supercut
final_clip = concatenate( cut_clips)
File “/Library/Python/2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
w = max([r[0] for r in sizes])
ValueError: max() arg is an empty sequence
If you aren’t, specify the file name after –input, rather than just the directory name… like this: videogrep.py –input ./video/video.mp4 –search ‘blah’ .. that should get you past that error.
It can be a folder or a file. But I was looking for a word between ‘ ‘… I took them off and now I get another error:
MoviePy: building video file video.mp4
—————————————-
Writing audio in videoTEMP_MPY_to_videofile_SOUND.ogg
|———-| 0/994 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
File “videogrep.py”, line 191, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 173, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 89, in create_supercut
final_clip.to_videofile(outputfile)
File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
verbose=verbose)
File “”, line 2, in to_audiofile
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
File “”, line 2, in ffmpeg_audiowrite
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
writer.write_frames(sndarray)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
self.proc.stdin.write(frames_array.tostring())
IOError: [Errno 32] Broken pipe
I understand. Thanks. See the last comment below… same issue you are having after moving past the ‘max() arg’ issue.
sorry. I should say, changing the names allowed the script to identify relevant tracks, and to begin writing the supercut, which it didn’t do before. So the same error message, but at least we’re moving closer, right? Sorry for such appalling lack-of-knowledge on my part.
Amazing! Do most YouTube videos include the subtitle tracks then? What do you use for your original source materials? I’ve got a few ideas to play with this on….
For those using ffmpeg on a Mac, and receiving a “Unknown encoder ‘libvorbis'” error:
– This results from a missing Ogg Vorbis library; you may want to reinstall ffmpeg with these options, using Homebrew:
“brew reinstall ffmpeg –with-libvpx –with-libvorbis”
Perfect! That’s solved that issue, cheers. (also here: http://stackoverflow.com/questions/19454509/ffmpeg-unable-to-find-encoder-libvorbis ). With ffmpeg reinstalled, I end up with a video. No audio though. Hmm. Closer!
Incredible work! So glad to find a freaky hacker!
I was getting the ‘ValueError: max() arg is an empty sequence’ error too, when I was passing just the directory name after –input, example:
python videogrep.py –input ./video –search ‘string’
When I specify the video file name like this:
python videogrep.py –input ./video/video.mp4 –search ‘disappear’
It no longer throws that error… but now I am getting this error:
MoviePy: building video file supercut.mp4
—————————————-
Writing audio in supercutTEMP_MPY_to_videofile_SOUND.ogg
|———-| 0/158 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
File “videogrep.py”, line 191, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 173, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 89, in create_supercut
final_clip.to_videofile(outputfile)
File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
verbose=verbose)
File “”, line 2, in to_audiofile
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
File “”, line 2, in ffmpeg_audiowrite
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
writer.write_frames(sndarray)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
self.proc.stdin.write(frames_array.tostring())
IOError: [Errno 32] Broken pipe
Anyone figured this out?
Nerdcore › Automatic Supercuts with Python
This is very cool. Could it be used to automaticallly cut specific phrases from a video?
Has anyone used this to *remove* phrases or words from a video?
A little thingie made with this great thingie :):
https://www.youtube.com/watch?v=xRMBv8aZLfQ
It would be interesting to write a script that sorted all the words by frequency, then run this on adverts since they are pretty repetitive already.
I wonder how difficult it would be to write a script that reverses the meaning of sentances by inserting or removing ‘not’
Being able to change pitch would be cool, you could make all the sentances into uprisers in this way.
How did you manage to get the subtitles from Youtube videos? Some websites offer to get automatically generated subtitles, but they aren’t good enough!
Also, this is just great! I am going to play around with it and see what I can do
I haven’t played around much with this but I’m thinking this could be used to cut video segments with specific words and then create a supercut montage with a whole new message. Similar to the editing videos that show news anchors singing songs, etc.
Great tool and thanks for sharing.
How would this do at recreating the Xbox One Reveal 2013 Highlights ( https://www.youtube.com/watch?v=KbWgUO-Rqcw )?
No one wants to produce a supercut of Apple WWDC keynote?
http.tv4.se » Sju säsonger rundstav
if you combine this with the python scripts from http://prosodylab.org/tools/aligner/
these scripts give you word-level precision if you have good audio and good transcript.
however they only work in monologues i guess.
awesome project btw!
Creating Supercut Videos with Python | Geeked Info
106 drop in » Blog Archive » a little slow, a little late
Wiring Supercuts | bavatuesdays
The Game Supercut | bavatuesdays
106 drop in » Blog Archive » and all the pieces matter
106 drop in » Blog Archive » all in the game
Any chance this can become a Windows program? I’ve been having a headache with updating Python and installing videogrep and all its associated libraries. Ahh!!
Supercut is the new dope!! | teKnotRaKiTanA
hey sam,
great job
i have a question, have you tried this with the videos without subtitles? I mean something like voice recognition…
or an additional line that produces text based on the voice and put the subtitle as a text file in the same folder (before your code starts)
?….thnks
Hey. Yes that’s actually built in to the latest version, using an open source tool called sphinx. There are instructions on how to install it here: http://antiboredom.github.io/videogrep/