SoundReader
Takes in an audio file and spits out timestamps of burps.
Notice support is provided for the computer literate:
- If you have questions, such as “How do I install python?”, “Which ONNX runtime should I install?”, “Why isn’t ffmpeg launching?”, kindly write them down on a piece of paper, throw it in a garbage can, and find the answer online- they are well documented.
- If you have questions regarding errors or exceptions, please write them down here with your system info and exactly what your command was, I’ll gladly look into it!
- If you write down “It’s not working,” you won’t be seeing a response (at least from me).
- Don’t ask for scan requests in this thread, put it Female Requests/Male Requests.
If the barrier to entry is too high, consider using videoscan.net, which is a ready-made, robust system and has done almost all of the work for you:
The goal is to get feedback and develop an automated way to tune the model using outside of using AudioSet.
Requirements
Python and pip
Ensure you have python
(3.10+) and pip
installed and executable from your shell
Dependencies
onnx: Used verify the model files for correctness
onnxruntime: Used as a platform to perform inferencing.
Notice: Install the correct onnxruntime
for your platform using this table: ONNX Runtime | Home Ensure Python is selected as the API.
tqdm: Used to display loading bar and time estimations
colored: Used to color video titles for better discernability
argparse: Used for command argument parsing
ffmpeg: used for transcoding
Install the required dependencies using pip:
pip install argparse colored numpy onnx onnxruntime tqdm
Downloads
Program: https://tctbff.duckdns.org/programs/sound_reader/sound_reader.py
Model: https://tctbff.duckdns.org/programs/sound_reader/bdetectionmodel_05_01_23.onnx
Usage
python sound_reader.py --model bdetectionmodel_05_01_23.onnx audio.opus
Will print out the timestamps with a confidence >= 90
This can be adjusted by tweaking --threshold
More tunables can be found with: python sound_reader.py --help
How can you help?
Send suggestions, feedback, and code!
Share false positives or false negatives with me (in the form of 2 second audio segments- ffmpeg
is your friend here) along with a label of what the audio segment is supposed to be (burping, coughing, talking, etc.)
Closing
If you think there are some not-so-obvious details I missed out on, post them here!
Also, do feel free to share changes or suggestions you’ve made here.
Colab Notebook
If you would like to try it out without installing anything, I’ve made a colab notebook with some testing code: Google Colab