Audiogrep: Automatic Audio “Supercuts”

Audiogrep is a python script that transcribes audio files and then creates audio “supercuts” based on search phrases. It uses CMU Pocketsphinx for speech-to-text, and pydub to splice audio segments together.

This is a sister project to my videogrep script, which does a similar thing but with video (and makes use of subtitle tracks rather than speech-to-text).

So far I’ve mostly been experimenting with audio books. Here, for example, are all the phrases in How Google Works by Eric Schmidt and Jonathan Rosenberg that contain the word “data”.
[soundcloud url=”https://api.soundcloud.com/tracks/192358628″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

And here are all the references to “private wealth” in Capital in the Twenty-first Century by Thomas Piketty:
[soundcloud url=”https://api.soundcloud.com/tracks/192358627″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

You can also extract just individual words, rather than phrases.

For example, here are all instances of “money” and “people” from the book The Automatic Millionaire: A Powerful One-Step Plan to Live and Finish Rich by David Bach:
[soundcloud url=”https://api.soundcloud.com/tracks/192352602″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

“Control”, “psychological”, “behavior” and “situations” from the nightmarishly titled Get Anyone to Do Anything: Never Feel Powerless Again — With Psychological Secrets to Control and Influence Every Situation by David J. Lieberman
[soundcloud url=”https://api.soundcloud.com/tracks/192352607″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

And here’s “relax”, and “large” from Breast Enlargement Hypnosis, a truly remarkable audio experience by Victoria Gallagher.
[soundcloud url=”https://api.soundcloud.com/tracks/192352619″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

Another experiment from the same amazing source:
[soundcloud url=”https://api.soundcloud.com/tracks/192353162″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

It’s also possible to use the script to create “frankenstein” sentences. Here’s Bill Clinton telling us to stop voting, sourced from his book My Life:
[soundcloud url=”https://api.soundcloud.com/tracks/192352609″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

And, by integrating moviepy, you can generate video slideshows like these or this:
[youtube https://www.youtube.com/watch?v=C6tpAGD00DM w=640]

The code is available on github. Next up I’ll be integrating some of this functionality into videogrep for more refined searches.

6 thoughts on “Audiogrep: Automatic Audio “Supercuts”

  1. great thank you:)

    i was wondering if it is possible to adjust the pocketsphinx as it seems to work slightly better with American speakers than others?

    the audio you used were from studio recorded contexts i guess? how accurate was the transcriptions in your examples?

    ta
    mura

    • Hi – yeah there are a lot of parameters that you can set in pocketsphinx. I haven’t really explored it all that much, but if you’re interested, take a look at their docs or just run pocketsphinx_continuous to see all the options.

  2. okay thanks

    is it possible to use pocketsphinx-continuous from within audiogrep?

    sorry to ask the question again but how accurate are your transcriptions?
    mine seem to come out quite bad!

    ta
    mura

    • Yeah – audiogrep is actually just using pocketsphinx so you can change the parameters inside the source. The quality of the transcriptions varies. They aren’t super accurate but they seem to work for the most part in my use cases.

  3. Awesome :) I wonder if it’s possible to use pocketsphinx in audiogrep as an alternative source to the subtitles (or somehow combine it with them to get more preciese timings?).

  4. Nerdcore › Automatic Audio-Supercuts with Python

Leave a Reply

Your email address will not be published. Required fields are marked *