Videogrep: Automatic Supercuts with Python

Videogrep is a python script that searches through dialog in videos and then cuts together a new video based on what it finds. Basically, it’s a command-line “supercut” generator. The code is here on github.

The script searches through a video’s associated subtitle file (which needs to be in the same folder as the video, in standard .srt format), identifies timestamps for the dialog, and then uses the wonderful moviepy library to generate the new final cut.

Here’s one of the results: every instance of a character saying the word “time” in the movie In Time (a film whose dialog appears to consist mostly of clock-related puns).

 

The script also works with multiple video files in the same directory. As an experiment, I mass downloaded press briefings from the Whitehouse youtube channel, then ran a word-level n-gram analysis on the subtitle tracks to find some commonly used phrases. I discovered that the phrase “what I can tell you” is occurs pretty frequently.

So, here is Jay Carney, the former Press Secretary, telling us what he can tell us. Note the necktie transitions.

 

You may have noticed some strange cuts in the above video. The accuracy of the edits is completely reliant on the accuracy of the subtitle tracks.

 

You can also use videogrep to find instances of people employing specific grammatical structures (I do this with the pattern library). For example, by running the following command

I end up with a video of TED speakers saying “[gerund] [determiner] [adjective] [noun]”

 

As I final experiment, I wrote a complimentary script that finds dialog-free sections of videos. Here, for example, is “Total Silence”: all the one to two second silences in the movie Total Recall.

 

Feel free to mess around with the script on github, and let me know if you have any suggestions for source material to run through it.

42 thoughts on “Videogrep: Automatic Supercuts with Python

  1. As the editor of Slacktory, a YouTube channel that’s run about over sixty supercuts, I thank you with all my chopped and screwed heart. And I have a suggestion: The Toast editor Mallory Ortberg recently tweeted a request for “a supercut of every time someone says “…Dead?” on a Law & Order show.” It sounds doable, depending on how captioners transcribe elliptical pauses.

  2. Looks like a very cool tool. How did you do the N-gram analysis? I googled and only came up with Google’s N-Gram viewer.

  3. Would you share more about your experience using the pattern library, clips? How did you find it? Are there other packages/libs in this space? First time hearing about it. I have a particular corpus that is riddled with acronyms and partial sentences. Wonder how it would do…

  4. Hi,

    This sounds stupid but, Could you insert the text you want to cut and set it so that it cuts the piece of film from the previous full stop to the end of the sentence containing your “phrase”.

    I don’t have any idea what I’d use this for, but I like it.

    Regards,

    Haaggis.

  5. Awesome!!I had actually been toying with the idea of making a videosed myself (for like, hey, changing the name of the star of your favorite movies with yours!), but it would need some pretty decent speech recognition support which has put me off pursuing the idea.
    I’m very into making software to fuck with movies, I did this very simple (still needs A LOT OF WORK) one that re-edits movies randomly
    https://github.com/takosuke/RandomEditor

  6. This is so cool. I’ve been playing with it here, but have started running into an error that is beyond me to solve. So I’ve pasted the trackback here, in case it makes more sense to you ;)

    Traceback (most recent call last):
    File “videogrep.py”, line 200, in
    videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
    File “videogrep.py”, line 182, in videogrep
    create_supercut(composition, outputfile, padding)
    File “videogrep.py”, line 87, in create_supercut
    video = concatenate(clips)
    File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
    w = max([r[0] for r in sizes])
    ValueError: max() arg is an empty sequence

    ….?

      • Hi Ben – I made sure that my .srt files and my .mp4 files had the same name (for some reason, names were wonky). Getting a new error now though:

        Writing audio in supercut.mp4.tmp20TEMP_MPY_to_videofile_SOUND.ogg
        |———-| 0/800 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
        File “videogrep.py”, line 200, in
        videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
        File “videogrep.py”, line 180, in videogrep
        create_supercut_in_batches(composition, outputfile, padding)
        File “videogrep.py”, line 115, in create_supercut_in_batches
        video = concatenate(clips)
        File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
        w = max([r[0] for r in sizes])
        ValueError: max() arg is an empty sequence

        …’a little knowledge is a dangerous thing’ applies to me and my skillz, I’m sure…

          • Nice job, Sam! I’m eager to use this too.

            FWIW, I’m also getting the emtpy sequence error. moviepy.audio.io.readers.skip_chunk raises an IOError 9 Bad File Descriptor at ‘self.proc.stdout.flush()’. The exception comes from create_supercut, but is hidden by the bare try/except block in create_supercut_in_batches.

            This was trying videogrep.py with a bunch of mp4′s in the input directory.

            Removing that call and also the one at moviepy.video.io.ffmpeg_reader line 79 lets the process continue, until I got this one:

            Writing video into TEMP_MPY_to_videofile
            |———-| 0/19841 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
            File “videogrep.py”, line 262, in
            videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize, args.sync)
            File “videogrep.py”, line 240, in videogrep
            create_supercut_in_batches(composition, outputfile, padding)
            File “videogrep.py”, line 117, in create_supercut_in_batches
            video.to_videofile(outputfile)
            File “/usr/local/lib/python2.7/site-packages/moviepy/video/VideoClip.py”, line 281, in to_videofile
            verbose=verbose)
            File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 143, in ffmpeg_write_video
            writer.write_frame(frame.astype(“uint8″))
            File “/usr/local/lib/python2.7/site-packages/moviepy/video/io/ffmpeg_writer.py”, line 108, in write_frame
            self.proc.stdin.write(img_array.tostring())
            IOError: [Errno 32] Broken pipe

            ffmpeg_writer was sending errors to /dev/null. Enabling the logfile showed this ffmpeg error in my case:

            ffmpeg version 2.2.3 Copyright (c) 2000-2014 the FFmpeg developers
            built on Jun 20 2014 20:09:36 with Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
            configuration: –prefix=/usr/local/Cellar/ffmpeg/2.2.3 –enable-shared –enable-pthreads –enable-gpl –enable-version3 –enable-nonfree –enable-hardcoded-tables –enable-avresample –enable-vda –cc=clang –host-cflags= –host-ldflags= –enable-libx264 –enable-libfaac –enable-libmp3lame –enable-libxvid –enable-libvorbis –enable-libvpx
            libavutil 52. 66.100 / 52. 66.100
            libavcodec 55. 52.102 / 55. 52.102
            libavformat 55. 33.100 / 55. 33.100
            libavdevice 55. 10.100 / 55. 10.100
            libavfilter 4. 2.100 / 4. 2.100
            libavresample 1. 2. 0 / 1. 2. 0
            libswscale 2. 5.102 / 2. 5.102
            libswresample 0. 18.100 / 0. 18.100
            libpostproc 52. 3.100 / 52. 3.100
            Input #0, rawvideo, from ‘pipe:’:
            Duration: N/A, start: 0.000000, bitrate: 530841 kb/s
            Stream #0:0: Video: rawvideo (RGB[24] / 0×18424752), rgb24, 1280×720, 530841 kb/s, 24 tbr, 24 tbn, 24 tbc
            [NULL @ 0x7fb03203d000] Unable to find a suitable output format for ‘TEMP_MPY_to_videofile’
            TEMP_MPY_to_videofile: Invalid argument
            Conversion failed!

            So my (silly) problem in this case was that I was setting just an output directory, not a filename, as ffmpeg expected a file extension to be present. Setting it to something like “–output ./out/video.mp4″ made it work.

            I’m using OSX 10.7.5, Python 2.7.5

        • Interesting, thanks for the reply dude.

          I will have a play with the naming and see if I have any similar results.

    • Same here :/

      OSX Mavericks 10.9.3

      Creating clips.
      Traceback (most recent call last):
      File “videogrep.py”, line 191, in
      videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
      File “videogrep.py”, line 173, in videogrep
      create_supercut(composition, outputfile, padding)
      File “videogrep.py”, line 88, in create_supercut
      final_clip = concatenate( cut_clips)
      File “/Library/Python/2.7/site-packages/moviepy/video/compositing/concatenate.py”, line 55, in concatenate
      w = max([r[0] for r in sizes])
      ValueError: max() arg is an empty sequence

      • If you aren’t, specify the file name after –input, rather than just the directory name… like this: videogrep.py –input ./video/video.mp4 –search ‘blah’ .. that should get you past that error.

        • It can be a folder or a file. But I was looking for a word between ‘ ‘… I took them off and now I get another error:

          MoviePy: building video file video.mp4
          —————————————-
          Writing audio in videoTEMP_MPY_to_videofile_SOUND.ogg
          |———-| 0/994 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
          File “videogrep.py”, line 191, in
          videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
          File “videogrep.py”, line 173, in videogrep
          create_supercut(composition, outputfile, padding)
          File “videogrep.py”, line 89, in create_supercut
          final_clip.to_videofile(outputfile)
          File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
          verbose=verbose)
          File “”, line 2, in to_audiofile
          File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
          return f(clip, *a, **k)
          File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
          codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
          File “”, line 2, in ffmpeg_audiowrite
          File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
          return f(clip, *a, **k)
          File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
          writer.write_frames(sndarray)
          File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
          self.proc.stdin.write(frames_array.tostring())
          IOError: [Errno 32] Broken pipe

          • I understand. Thanks. See the last comment below… same issue you are having after moving past the ‘max() arg’ issue.

  7. sorry. I should say, changing the names allowed the script to identify relevant tracks, and to begin writing the supercut, which it didn’t do before. So the same error message, but at least we’re moving closer, right? Sorry for such appalling lack-of-knowledge on my part.

  8. Amazing! Do most YouTube videos include the subtitle tracks then? What do you use for your original source materials? I’ve got a few ideas to play with this on….

  9. For those using ffmpeg on a Mac, and receiving a “Unknown encoder ‘libvorbis’” error:

    - This results from a missing Ogg Vorbis library; you may want to reinstall ffmpeg with these options, using Homebrew:

    “brew reinstall ffmpeg –with-libvpx –with-libvorbis”

  10. I was getting the ‘ValueError: max() arg is an empty sequence’ error too, when I was passing just the directory name after –input, example:

    python videogrep.py –input ./video –search ‘string’

    When I specify the video file name like this:

    python videogrep.py –input ./video/video.mp4 –search ‘disappear’

    It no longer throws that error… but now I am getting this error:

    MoviePy: building video file supercut.mp4
    —————————————-
    Writing audio in supercutTEMP_MPY_to_videofile_SOUND.ogg
    |———-| 0/158 0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
    File “videogrep.py”, line 191, in
    videogrep(args.inputfile, args.outputfile, args.search, args.searchtype, args.maxclips, args.padding, args.test, args.randomize)
    File “videogrep.py”, line 173, in videogrep
    create_supercut(composition, outputfile, padding)
    File “videogrep.py”, line 89, in create_supercut
    final_clip.to_videofile(outputfile)
    File “/Library/Python/2.7/site-packages/moviepy/video/VideoClip.py”, line 275, in to_videofile
    verbose=verbose)
    File “”, line 2, in to_audiofile
    File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
    return f(clip, *a, **k)
    File “/Library/Python/2.7/site-packages/moviepy/audio/AudioClip.py”, line 104, in to_audiofile
    codec=codec, bitrate=bitrate, write_logfile=write_logfile, verbose=verbose)
    File “”, line 2, in ffmpeg_audiowrite
    File “/Library/Python/2.7/site-packages/moviepy/decorators.py”, line 60, in requires_duration
    return f(clip, *a, **k)
    File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 125, in ffmpeg_audiowrite
    writer.write_frames(sndarray)
    File “/Library/Python/2.7/site-packages/moviepy/audio/io/ffmpeg_audiowriter.py”, line 78, in write_frames
    self.proc.stdin.write(frames_array.tostring())
    IOError: [Errno 32] Broken pipe

    Anyone figured this out?

  11. Nerdcore › Automatic Supercuts with Python

  12. It would be interesting to write a script that sorted all the words by frequency, then run this on adverts since they are pretty repetitive already.

  13. I wonder how difficult it would be to write a script that reverses the meaning of sentances by inserting or removing ‘not’

    Being able to change pitch would be cool, you could make all the sentances into uprisers in this way.

  14. How did you manage to get the subtitles from Youtube videos? Some websites offer to get automatically generated subtitles, but they aren’t good enough!

    Also, this is just great! I am going to play around with it and see what I can do

  15. I haven’t played around much with this but I’m thinking this could be used to cut video segments with specific words and then create a supercut montage with a whole new message. Similar to the editing videos that show news anchors singing songs, etc.
    Great tool and thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">