Videogrep: Automatic Supercuts with Python

Videogrep is a python script that searches through dialog in videos and then cuts together a new video based on what it finds. Basically, it’s a command-line “supercut” generator. The code is here on github.

The script searches through a video’s associated subtitle file (which needs to be in the same folder as the video, in standard .srt format), identifies timestamps for the dialog, and then uses the wonderful moviepy library to generate the new final cut.

Here’s one of the results: every instance of a character saying the word “time” in the movie In Time (a film whose dialog appears to consist mostly of clock-related puns).

The script also works with multiple video files in the same directory. As an experiment, I mass downloaded press briefings from the Whitehouse youtube channel, then ran a word-level n-gram analysis on the subtitle tracks to find some commonly used phrases. I discovered that the phrase “what I can tell you” is occurs pretty frequently.

So, here is Jay Carney, the former Press Secretary, telling us what he can tell us. Note the necktie transitions.

You may have noticed some strange cuts in the above video. The accuracy of the edits is completely reliant on the accuracy of the subtitle tracks.

You can also use videogrep to find instances of people employing specific grammatical structures (I do this with the pattern library). For example, by running the following command

python videogrep.py --input terrible_ted_talks/ --search '^VBG DT JJ NN' --search_type pos

1	python videogrep.py --input terrible_ted_talks/ --search '^VBG DT JJ NN' --search_type pos

I end up with a video of TED speakers saying “[gerund] [determiner] [adjective] [noun]”

As I final experiment, I wrote a complimentary script that finds dialog-free sections of videos. Here, for example, is “Total Silence”: all the one to two second silences in the movie Total Recall.

Feel free to mess around with the script on github, and let me know if you have any suggestions for source material to run through it.

55 thoughts on “Videogrep: Automatic Supercuts with Python”

Nick Douglas says:

June 19, 2014 at 10:38 pm

As the editor of Slacktory, a YouTube channel that’s run about over sixty supercuts, I thank you with all my chopped and screwed heart. And I have a suggestion: The Toast editor Mallory Ortberg recently tweeted a request for “a supercut of every time someone says “…Dead?” on a Law & Order show.” It sounds doable, depending on how captioners transcribe elliptical pauses.

Reply
- sam says:
  
  June 19, 2014 at 10:41 pm
  
  You’re very welcome. Also, totally doable.
  
  Reply
- Whit says:
  
  June 22, 2014 at 5:37 am
  
  This was sent to me by a good friend, and of course nick was the first response.
  
  Reply
Alex says:

June 20, 2014 at 12:06 am

That was the most awkward cut of Total Recall I have ever seen. Awesome work!

Reply
John Pasden says:

June 20, 2014 at 2:58 am

This is amazing! No reason why it won’t work with non-English (UTF-8) text, right?

Reply
- sam says:
  
  June 20, 2014 at 3:14 am
  
  Yeah the basic search should work in any language.
  
  Reply
Roger says:

June 20, 2014 at 3:49 am

Looks like a very cool tool. How did you do the N-gram analysis? I googled and only came up with Google’s N-Gram viewer.

Reply
- sam says:
  
  June 20, 2014 at 4:34 am
  
  I included a tool that does this in the github repo: https://github.com/antiboredom/videogrep/blob/master/tools/ngrams.py
  
  Reply
  - seg says:
    
    June 21, 2014 at 7:41 am
    
    Please include an example usage for the n-grams script, the source isn’t entirely verbose. the first argument is the input file, but as someone who is inexperienced with n-gram creation I haven’t a clue what the variables pulled in from the command arguments ‘total’ and ‘threshold’ represent.
    
    Thanks
    
    Reply
TingTing says:

June 20, 2014 at 5:08 am

This is awesome!!!!

Reply
Fizer says:

June 20, 2014 at 6:56 am

Really mind blowing. I love it. Hope it works for other language. Right?

Reply
alexander sicular says:

June 20, 2014 at 7:24 am

Would you share more about your experience using the pattern library, clips? How did you find it? Are there other packages/libs in this space? First time hearing about it. I have a particular corpus that is riddled with acronyms and partial sentences. Wonder how it would do…

Reply
mehmet says:

June 20, 2014 at 7:51 am

great work. congratz!

Reply
Haaggis says:

June 20, 2014 at 10:54 am

Hi,

This sounds stupid but, Could you insert the text you want to cut and set it so that it cuts the piece of film from the previous full stop to the end of the sentence containing your “phrase”.

I don’t have any idea what I’d use this for, but I like it.

Regards,

Haaggis.

Reply
cthulberg says:

June 20, 2014 at 11:01 am

Wow, very good idea!
Here, all the “Making the world a better place” in HBO’s Silicon Valley https://vimeo.com/98720197

Reply
takosuke says:

June 20, 2014 at 12:58 pm

Awesome!!I had actually been toying with the idea of making a videosed myself (for like, hey, changing the name of the star of your favorite movies with yours!), but it would need some pretty decent speech recognition support which has put me off pursuing the idea.
I’m very into making software to fuck with movies, I did this very simple (still needs A LOT OF WORK) one that re-edits movies randomly
https://github.com/takosuke/RandomEditor

Reply
Shawn says:

June 20, 2014 at 2:16 pm

This is so cool. I’ve been playing with it here, but have started running into an error that is beyond me to solve. So I’ve pasted the trackback here, in case it makes more sense to you ;)

Traceback (most recent call last):
File “videogrep.py”, line 200, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 182, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 87, in create_supercut
video = concatenate(clips)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
w = max([r[0] for r in sizes])
ValueError: max() arg is an empty sequence

….?

Reply
- Ben says:
  
  June 20, 2014 at 4:05 pm
  
  Getting the same problem.
  
  Reply
  - Shawn says:
    
    June 20, 2014 at 4:55 pm
    
    Hi Ben – I made sure that my .srt files and my .mp4 files had the same name (for some reason, names were wonky). Getting a new error now though:
    
    Writing audio in supercut.mp4.tmp20TEMP_MPY_to_videofile_SOUND.ogg
    |———-| 0/800 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
    File “videogrep.py”, line 200, in
    videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
    File “videogrep.py”, line 180, in videogrep
    create_supercut_in_batches(composition, outputfile, padding)
    File “videogrep.py”, line 115, in create_supercut_in_batches
    video = concatenate(clips)
    File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
    w = max([r[0] for r in sizes])
    ValueError: max() arg is an empty sequence
    
    …’a little knowledge is a dangerous thing’ applies to me and my skillz, I’m sure…
    
    Reply
    - sam says:
      
      June 20, 2014 at 5:57 pm
      
      Hey guys – sorry it’s not working. I’m not sure what the issue is but I will look into it.
      
      Reply
      - Ben says:
        
        June 21, 2014 at 12:59 pm
        
        Cheers, Sam!
      - fonso says:
        
        June 23, 2014 at 3:10 pm
        
        Nice job, Sam! I’m eager to use this too.
        
        FWIW, I’m also getting the emtpy sequence error. moviepy.audio.io.readers.skip_chunk raises an IOError 9 Bad File Descriptor at ‘self.proc.stdout.flush()’. The exception comes from create_supercut, but is hidden by the bare try/except block in create_supercut_in_batches.
        
        This was trying videogrep.py with a bunch of mp4’s in the input directory.
        
        Removing that call and also the one at moviepy.video.io.ffmpeg_reader line 79 lets the process continue, until I got this one:
        
        Writing video into TEMP_MPY_to_videofile
        |———-| 0/19841 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
        File “videogrep.py”, line 262, in
        videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize, args.sync)
        File “videogrep.py”, line 240, in videogrep
        create_supercut_in_batches(composition, outputfile, padding)
        File “videogrep.py”, line 117, in create_supercut_in_batches
        video.to_videofile(outputfile)
        File “/usr/local/lib/python2.7/site-packages/moviepy/video/VideoClip.py”, line 281, in to_videofile
        verbose=verbose)
        File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 143, in ffmpeg_write_video
        writer.write_frame(frame.astype(“uint8”))
        File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 108, in write_frame
        self.proc.stdin.write(img_array.tostring())
        IOError: [Errno 32] Broken pipe
        
        ffmpeg_writer was sending errors to /dev/null. Enabling the logfile showed this ffmpeg error in my case:
        
        ffmpeg version 2.2.3 Copyright (c) 2000-2014 the FFmpeg developers
        built on Jun 20 2014 20:09:36 with Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
        configuration: –prefix=/usr/local/Cellar/ffmpeg/2.2.3 –enable-shared –enable-pthreads –enable-gpl –enable-version3 –enable-nonfree –enable-hardcoded-tables –enable-avresample –enable-vda –cc=clang –host-cflags= –host-ldflags= –enable-libx264 –enable-libfaac –enable-libmp3lame –enable-libxvid –enable-libvorbis –enable-libvpx
        libavutil 52. 66.100 / 52. 66.100
        libavcodec 55. 52.102 / 55. 52.102
        libavformat 55. 33.100 / 55. 33.100
        libavdevice 55. 10.100 / 55. 10.100
        libavfilter 4. 2.100 / 4. 2.100
        libavresample 1. 2. 0 / 1. 2. 0
        libswscale 2. 5.102 / 2. 5.102
        libswresample 0. 18.100 / 0. 18.100
        libpostproc 52. 3.100 / 52. 3.100
        Input #0, rawvideo, from ‘pipe:’:
        Duration: N/A, start: 0.000000, bitrate: 530841 kb/s
        Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1280×720, 530841 kb/s, 24 tbr, 24 tbn, 24 tbc
        [NULL @ 0x7fb03203d000] Unable to find a suitable output format for ‘TEMP_MPY_to_videofile’
        TEMP_MPY_to_videofile: Invalid argument
        Conversion failed!
        
        So my (silly) problem in this case was that I was setting just an output directory, not a filename, as ffmpeg expected a file extension to be present. Setting it to something like “–output ./out/video.mp4” made it work.
        
        I’m using OSX 10.7.5, Python 2.7.5
    - Ben says:
      
      June 21, 2014 at 1:00 pm
      
      Interesting, thanks for the reply dude.
      
      I will have a play with the naming and see if I have any similar results.
      
      Reply
- Saint says:
  
  June 21, 2014 at 10:16 pm
  
  Same here :/
  
  OSX Mavericks 10.9.3
  
  Creating clips.
  Traceback (most recent call last):
  File “videogrep.py”, line 191, in
  videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
  File “videogrep.py”, line 173, in videogrep
  create_supercut(composition, outputfile, padding)
  File “videogrep.py”, line 88, in create_supercut
  final_clip = concatenate( cut_clips)
  File “/Library/Python/2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
  w = max([r[0] for r in sizes])
  ValueError: max() arg is an empty sequence
  
  Reply
  - shawn says:
    
    June 22, 2014 at 4:27 pm
    
    If you aren’t, specify the file name after –input, rather than just the directory name… like this: videogrep.py –input ./video/video.mp4 –search ‘blah’ .. that should get you past that error.
    
    Reply
    - Saint says:
      
      June 22, 2014 at 4:48 pm
      
      It can be a folder or a file. But I was looking for a word between ‘ ‘… I took them off and now I get another error:
      
      MoviePy: building video file video.mp4
      —————————————-
      Writing audio in videoTEMP_MPY_to_videofile_SOUND.ogg
      |———-| 0/994 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
      File “videogrep.py”, line 191, in
      videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
      File “videogrep.py”, line 173, in videogrep
      create_supercut(composition, outputfile, padding)
      File “videogrep.py”, line 89, in create_supercut
      final_clip.to_videofile(outputfile)
      File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
      verbose=verbose)
      File “”, line 2, in to_audiofile
      File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
      return f(clip, *a, **k)
      File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
      codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
      File “”, line 2, in ffmpeg_audiowrite
      File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
      return f(clip, *a, **k)
      File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
      writer.write_frames(sndarray)
      File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
      self.proc.stdin.write(frames_array.tostring())
      IOError: [Errno 32] Broken pipe
      
      Reply
      - shawn says:
        
        June 22, 2014 at 4:59 pm
        
        I understand. Thanks. See the last comment below… same issue you are having after moving past the ‘max() arg’ issue.
shawn says:

June 20, 2014 at 4:58 pm

sorry. I should say, changing the names allowed the script to identify relevant tracks, and to begin writing the supercut, which it didn’t do before. So the same error message, but at least we’re moving closer, right? Sorry for such appalling lack-of-knowledge on my part.

Reply
Pete says:

June 21, 2014 at 2:02 am

Amazing! Do most YouTube videos include the subtitle tracks then? What do you use for your original source materials? I’ve got a few ideas to play with this on….

Reply
Ben says:

June 21, 2014 at 1:09 pm

For those using ffmpeg on a Mac, and receiving a “Unknown encoder ‘libvorbis'” error:

– This results from a missing Ogg Vorbis library; you may want to reinstall ffmpeg with these options, using Homebrew:

“brew reinstall ffmpeg –with-libvpx –with-libvorbis”

Reply
- Shawn Graham says:
  
  June 21, 2014 at 7:51 pm
  
  Perfect! That’s solved that issue, cheers. (also here: http://stackoverflow.com/questions/19454509/ffmpeg-unable-to-find-encoder-libvorbis ). With ffmpeg reinstalled, I end up with a video. No audio though. Hmm. Closer!
  
  Reply
bbfc says:

June 21, 2014 at 5:12 pm

Incredible work! So glad to find a freaky hacker!

Reply
Me says:

June 22, 2014 at 4:25 pm

I was getting the ‘ValueError: max() arg is an empty sequence’ error too, when I was passing just the directory name after –input, example:

python videogrep.py –input ./video –search ‘string’

When I specify the video file name like this:

python videogrep.py –input ./video/video.mp4 –search ‘disappear’

It no longer throws that error… but now I am getting this error:

MoviePy: building video file supercut.mp4
—————————————-
Writing audio in supercutTEMP_MPY_to_videofile_SOUND.ogg
|———-| 0/158 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
File “videogrep.py”, line 191, in
videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
File “videogrep.py”, line 173, in videogrep
create_supercut(composition, outputfile, padding)
File “videogrep.py”, line 89, in create_supercut
final_clip.to_videofile(outputfile)
File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
verbose=verbose)
File “”, line 2, in to_audiofile
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
File “”, line 2, in ffmpeg_audiowrite
File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
return f(clip, *a, **k)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
writer.write_frames(sndarray)
File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
self.proc.stdin.write(frames_array.tostring())
IOError: [Errno 32] Broken pipe

Anyone figured this out?

Reply
Nerdcore › Automatic Supercuts with Python
Mark Crane says:

June 23, 2014 at 12:19 pm

This is very cool. Could it be used to automaticallly cut specific phrases from a video?

Reply
craniac says:

June 23, 2014 at 12:29 pm

Has anyone used this to *remove* phrases or words from a video?

Reply
fonso says:

June 23, 2014 at 4:10 pm

A little thingie made with this great thingie :):
https://www.youtube.com/watch?v=xRMBv8aZLfQ

Reply
Stu says:

June 24, 2014 at 12:28 am

It would be interesting to write a script that sorted all the words by frequency, then run this on adverts since they are pretty repetitive already.

Reply
Stu says:

June 24, 2014 at 12:32 am

I wonder how difficult it would be to write a script that reverses the meaning of sentances by inserting or removing ‘not’

Being able to change pitch would be cool, you could make all the sentances into uprisers in this way.

Reply
Rapha says:

June 24, 2014 at 4:28 am

How did you manage to get the subtitles from Youtube videos? Some websites offer to get automatically generated subtitles, but they aren’t good enough!

Also, this is just great! I am going to play around with it and see what I can do

Reply
Ricardo says:

June 25, 2014 at 3:09 pm

I haven’t played around much with this but I’m thinking this could be used to cut video segments with specific words and then create a supercut montage with a whole new message. Similar to the editing videos that show news anchors singing songs, etc.
Great tool and thanks for sharing.

Reply
MarioFannio says:

July 11, 2014 at 12:31 pm

How would this do at recreating the Xbox One Reveal 2013 Highlights ( https://www.youtube.com/watch?v=KbWgUO-Rqcw )?

Reply
yegle says:

July 25, 2014 at 8:07 am

No one wants to produce a supercut of Apple WWDC keynote?

Reply
http.tv4.se » Sju säsonger rundstav
johannesgj says:

August 25, 2014 at 9:56 pm

if you combine this with the python scripts from http://prosodylab.org/tools/aligner/
these scripts give you word-level precision if you have good audio and good transcript.
however they only work in monologues i guess.

awesome project btw!

Reply
Creating Supercut Videos with Python | Geeked Info
106 drop in » Blog Archive » a little slow, a little late
Wiring Supercuts | bavatuesdays
The Game Supercut | bavatuesdays
106 drop in » Blog Archive » and all the pieces matter
106 drop in » Blog Archive » all in the game
anthonyteacher says:

February 3, 2015 at 9:01 pm

Any chance this can become a Windows program? I’ve been having a headache with updating Python and installing videogrep and all its associated libraries. Ahh!!

Reply
Supercut is the new dope!! | teKnotRaKiTanA
Nims says:

October 27, 2015 at 3:19 pm

hey sam,
great job
i have a question, have you tried this with the videos without subtitles? I mean something like voice recognition…
or an additional line that produces text based on the voice and put the subtitle as a text file in the same folder (before your code starts)
?….thnks

Reply
- sam says:
  
  October 27, 2015 at 5:20 pm
  
  Hey. Yes that’s actually built in to the latest version, using an open source tool called sphinx. There are instructions on how to install it here: http://antiboredom.github.io/videogrep/
  
  Reply

Sam Lavigne

work in progress

Videogrep: Automatic Supercuts with Python

55 thoughts on “Videogrep: Automatic Supercuts with Python”

Leave a Reply Cancel reply