Made a utility to timestamp audio automatically

SoundReader

Takes in an audio file and spits out timestamps of burps.

:warning: Notice support is provided for the computer literate:

  • If you have questions, such as “How do I install python?”, “Which ONNX runtime should I install?”, “Why isn’t ffmpeg launching?”, kindly write them down on a piece of paper, throw it in a garbage can, and find the answer online- they are well documented.
  • If you have questions regarding errors or exceptions, please write them down here with your system info and exactly what your command was, I’ll gladly look into it!
  • If you write down “It’s not working,” you won’t be seeing a response (at least from me).
  • Don’t ask for scan requests in this thread, put it Female Requests/Male Requests.

If the barrier to entry is too high, consider using videoscan.net, which is a ready-made, robust system and has done almost all of the work for you:

The goal is to get feedback and develop an automated way to tune the model using outside of using AudioSet.

Requirements

Python and pip

Ensure you have python (3.10+) and pip installed and executable from your shell

Dependencies

onnx: Used verify the model files for correctness

onnxruntime: Used as a platform to perform inferencing.
Notice: Install the correct onnxruntime for your platform using this table: ONNX Runtime | Home Ensure Python is selected as the API.

tqdm: Used to display loading bar and time estimations

colored: Used to color video titles for better discernability

argparse: Used for command argument parsing

ffmpeg: used for transcoding

Install the required dependencies using pip:

pip install argparse colored numpy onnx onnxruntime tqdm

Downloads

Program: https://tctbff.duckdns.org/programs/sound_reader/sound_reader.py
Model: https://tctbff.duckdns.org/programs/sound_reader/bdetectionmodel_05_01_23.onnx

Usage

python sound_reader.py --model bdetectionmodel_05_01_23.onnx audio.opus

Will print out the timestamps with a confidence >= 90 This can be adjusted by tweaking --threshold

More tunables can be found with: python sound_reader.py --help

How can you help?

Send suggestions, feedback, and code!
Share false positives or false negatives with me (in the form of 2 second audio segments- ffmpeg is your friend here) along with a label of what the audio segment is supposed to be (burping, coughing, talking, etc.)

Closing

If you think there are some not-so-obvious details I missed out on, post them here!
Also, do feel free to share changes or suggestions you’ve made here.

Colab Notebook

If you would like to try it out without installing anything, I’ve made a colab notebook with some testing code: Google Colab

25 Likes

thanks for the work, can you do a video showcasing how it works? don’t really have to explain anything, just using the tool

3 Likes

Fair enough, give me a few mins. I’ll do it on windows, since I don’t have a mac and I’m assuming this is usual stuff for those on Linux.

10 Likes

it worked, thanks, but, the archive yt-dlp creates, it stays on your hd? if yes how to delete already scanned videos

You’ll have the remove them manually (either by using rm, del, or your file manager). What I usually do is, download all the videos from a channel all at once and scan them by using a wildcard aka: python sound_reader.py --model bdetectionmodel_05_01_23.onnx *.opus

3 Likes

ok, it’s normal twitch vods take more time to process?

Yes, but it depends on your computer. If you’re running some low spec computer like a laptop w/o a gpu(I was running my demo on an old laptop), it will usually be slow and the difference is large. If you have a gpu, usually downloading will take more time than actually getting the timestamps

2 Likes

what do you mean? does exist a command to scan all videos from a channel?

Yes, this is that command. It will pass in every file that ends with .opus into sound_reader. Note it will depend on which shell you’re using. Zsh (the default on macos) and bash (the usual default on Linux) will support wildcards like *. On windows, you can try Powershell and see if it works there, but to my knowledge command prompt doesn’t have this simple feature.

in a single video works, but when in a channel i get this: RuntimeError: Failed to load audio: ffmpeg version 2023-05-31-git-baa9fccf8d-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers

1 Like

I’ll need more details than that, but what it sounds like is: you tried using the wildcard trick but either your shell doesn’t support it or your wildcard didn’t match anything.

1 Like

nevermind, got it

could this theoretically work for farts also, or just burps?

Yes, audioset has a class for fart sounds and the model was trained on it as well. You can specify the --focus_idx 60 flag (60 is the class id for farts) to make sound_detector print timestamps for farts.

3 Likes

thank you!

1 Like

It uses ffmpeg behind the scenes to read the audio, so anything ffmpeg supports- this supports… And you can bet ffmpeg supports some of the most obscure formats out there

1 Like

So in the tutorial I used directml, since that’s supposed to be the ‘just works’ ml api on windows. It appears that it is not. Since you’re using an Nvidia GPU, uninstall onnxruntine-directml with pip and then install onnxruntime-gpu instead. This will use Tensorrt or cuda depending on your drivers (picks Tensorrt when possible, since it uses the tensor cores on your gpu). Let me know if it works after that, since the error looks like a directml issue

Edit: Also, make sure you delete the existing model and redownload it. The model I serve to you is a base unoptimized version. Once you perform inferences using sound_reader, onnx will perform optimizations on the model and save them. Those changes will include device specific instructions and will not work on any other device.

1 Like

I’m using a 1070 and it worked with the default install. could I see a performance gain doing the gpu install you listed?

You probably won’t see a huge gain, but cuda/cublas will generally perform better than directml.

Whoops, forgot to mention that you will need cuda for that. You can download it here: CUDA Toolkit Archive | NVIDIA Developer get cuda 11.6 and select all the correct info (onnx doesn’t support 12 atm) and get cuDNN Archive | NVIDIA Developer 8.5.0 for cuda 11.x

1 Like