Made a utility to timestamp audio automatically

That’s really weird. If you’re certain that you removed onnxruntime-directml, can you run pip install --force-reinstall onnxruntime-gpu

Idk if this is a common problem or just me not being familiar with coding, but even after clearing a cell
or refreshing in the GoogleCollab, the sound reader rescans old stuff.

awesome tool, thanks for posting it. got it working quickly, no hassle on my end. im ripping the audio out of an mp4 or ts file then converting it to oppus and feeding it into the tool. works perfectly, amazing tool

edit: sorry overlooked the false positive formatting.

so I’m not sure if the program was just detecting just the croaky sound at the end or that in addition to the loud emphasis of the words beforehand, kind of a coincidence. the first one was 2 timestamps and the second was 3 timestamps.

I think you might’ve forgotten to remove the old files before rerunning. Make sure to do that, since it will scan all files (including the old ones)

@joemanxdjoe You actually don’t need to do any file ripping on your own (unless the file is not compatible with ffmpeg), sound_reader will do that automatically

2 Likes

ay nice, thats handy, time including extracting and converting audio was 166 seconds, just feeding the video file straight into it is 120 seconds, and a lot simpler code wise. (forgot to mention this is also includes a few other processing stuff after, your model is extremely quick)

1 Like

how long was your video?

this was a 16 min video (406mb), before extracting and converting audio then feeding it took about 77 seconds, now its 19 seconds to get the timestamps

nice project. quick suggestion: you might want to check out audioread instead of calling ffmpeg directly. also, if you host it in git somewhere i’d be down to contribute.

Thanks TC ! I had been dreading setting up some etl pipeline and learning enough about training to set up something similar, but with a different runtime (jvm, because of familiarity). Editing has always been a chore. This will save hours, days ? It will let me edit videos that I had basically written off and was liable to delete this year (ex kaykaysins and candybones vlogs). Wow.

1 Like

I don’t really see a huge use for using another library over just ffmpeg, it gets the job done and I can’t really think of much that using another audio library would add (besides dropping the ffmpeg dependency for Mac users and using coreaudio, which could lead to its own issues).

To answer your seconds question, I have a local git repository, though the commits have been signed with my personal pgp keys, so I’d rather not share the raw git repository. Sending patch files are my preferred way of receiving contributions currently

1 Like

Hey, I wanted to ask if you think that it’s possible to just put the vod links or m3u8 links in there and it scans them, so you dont have to download them all?

Any file that is supported by ffmpeg is supported by sound_reader. So, yes, you can use an m3u8 url as your video argument.

1 Like

That’s very good to know, is it also possible to use it on entire channels?

Hey there! Could you kindly explain how to use the “–batch_size” option and what exactly it is doing? I couldn’t really understand what it will do and what value I should include in it based on the description: “the batch size is the amount of samples in one iteration. Larger sizes offer better performance, but will consume significantly more memory”

@nonameVIP

@thegasprovider
When sound_detector loads a file, it splits it up into chunks and performs inferences on at a time. The reason for this is because loading an entire audio segment into the neural network uses lots of memory (due to the attention mechanism), so it makes sense to only operate on a small batch at a time. The cost to this is that there is a slowdown when loading an unloading the model and audio samples from memory. The batch size argument takes in the number of samples to use per chunk (the audio is 32000 samples/second). The default batch size is 15 minutes (32000 * 15 * 60). You can increase the number to your liking, but it must be divisible by 32000 or you may encounter duplicate timestamps. If you want to change it, you can keep increasing the batch size up until the program crashes (due to there not being enough memory on your computer) if you are concerned about optimal speed.

2 Likes

Thanks for the thorough explanation! I thought it was related to that but didn’t know the exact amount or that it should be divisible by 32000. I put 100 as a value (thinking it’d create 100 5-minute batches) and got an error which is why I came here to ask hehe I’ll test it out some other time with the information you provided. Thanks again! :slight_smile:

Hey, have you mentioned somewhere what tool you use to download Twitch audios?

1 Like

The GoogleColab file isn’t working anymore :frowning: