Made a utility to timestamp audio automatically

The error was due to a change the in the colored library api. It has been fixed and the colab runtime will now download the updated file.

When reporting issues, please be more specific than: it isn’t working. Provide error messages.

Got it. Thank u

1 Like

How did you train this model?

Using torch and vggsound as my dataset

It works well. thank you. very few false positives. not sure about false negatives though - didn’t test.

I’ve been going through a bunch of timestamps and it seems like every false positive is the same little croaky burp sounding noise people occasionally make when talking. these are usually never 99 or 100%, but 98 and lower I’ll find them.

1 Like

There are a few strange cases with it. Ive found it most common in mukbangs, where some very obvious ‘loud’ burps will consistently score below 50% certainty. In a future model update I plan to have those fixed as well. You can drop the threshold in exchange for more timestamps, however the model is very affirmative with it’s predictions and usually scores end up on the low and high ends.

1 Like

ChatGPT made a script to read in timestamps continuously and delete files if none found
Some input/terminal issues but it seems to work
Now the problem is to find some good burp-positive podcasts :slight_smile:

from multiprocessing import Pool, Manager
import urllib.parse
import os
import subprocess
import requests

# Define maximum number of simultaneous tasks
MAX_TASKS = 8

# Function to validate URL
def validate_url(url):
    try:
        result = urllib.parse.urlparse(url)
        return all([result.scheme, result.netloc])
    except ValueError:
        return False

# Function to process a single file
def process_file(url):
    # Get the file name from the URL
    filename = url.split('/')[-1].split('?')[0]

    # Download the file
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=1024):
            fd.write(chunk)

    # Run the audio processing script
    output = subprocess.check_output(["python3", "sound_reader.py", "--model", "bdetectionmodel_05_01_23.onnx", filename])
    if ":" in output.decode():
        print(f"\n_____ [{filename}] _____\n")
        print(output)
    else:
        print(f"No timestamp found in: {filename}")
        os.remove(filename)


if __name__ == "__main__":
    queue, pool = Manager().Queue(), Pool(processes=MAX_TASKS)

    # Start a loop to read URLs
    while True:
        url = input("Enter URL: ")

        if not validate_url(url):
            print("Invalid URL!")

        # Add the URL to the queue
        queue.put(url)

        # If there are URLs in the queue and free workers, start a new task
        while not queue.empty() and len(pool._cache) < MAX_TASKS:
            pool.apply_async(process_file, args=(queue.get(),))

    # Wait for all tasks to finish
    pool.close()
    pool.join()

This is a cool script, one recommendation I’d make is to use the sound_reader.py file as an api instead of spawning an instances of it for every task. Loading/warming up and unloading the model contributes to a significant amount of time when it comes to scanning. Additionally, I wouldn’t recommend running multiple instances of ONNX inferencing at the same time. The compute kernels generated by onnx are more or less as good as you can get and running multiple at the same time leads to overhead.

If you’re up for it, I’d look into making the transcode step asynchronous. In its current state, sound_reader transcodes each file one by one to a WAV. The transcode step only uses 1 core, so it would be a good place to do things in parallel.

2 Likes

I’ve taken your advice and have changed the chunking system to use soundfile instead of directly calling ffmpeg. The downside is that you may need to explicitly call ffmpeg for some audio encodings, such as AAC.

For those just now reading, ffmpeg is no longer required for using sound_reader, and soundfile is a new dependency that can be installed via pip.

Edit: I’ve returned to ffmpeg, however, I’ve made the transcode step asynchronous. So there’s no more wasted time when inferencing that could also be transcoding files. This makes it possible to stream data directly into sound_reader directly with a url.

1 Like

Is there a way to order results based only in ascending time? the default way it orders results partly by percentage is kind of hard to use, ends up with multiple timestamps of the same burp out of order, would just be easier to ignore percentage and have chunks of timestamps.

1 Like

That could be done, I’m actually considering making it the default behavior as well, since the order doesn’t matter as much.

1 Like

I let it run, and then let excel sort the copied output. There’s a sort a<-z button somewhere (I actually use libreoffice, so I don’t know where it is in the MS version). I guess you could use some script or site to sort them if you don’t have/want a spreadsheet program.

1 Like

I’ve just updated the program so that everything is ordered chronologically rather than by the confidence (per chunk), no need to hassle with Excel anymore.

1 Like

@TC many thanks for making the tool, unfortunately it seems like the download link for the sound_read.py is not working, could you have a look at that please?

1 Like

@TC the link for the model is also not working

Down for maintenance. Will update when it’s back online.

Everything should be back online

2 Likes

Working now, many thanks!

Small update:

Fixed issue with clock drift when audio and video do not have the same length (or have starting offsets that differ)

ffmpeg will now alert when errors occur, instead of silently failing

1 Like

Update with breaking changes:

  • Output will now be presented in the JSON format. If you are using a script to parse the output, this will affect you
  • The --threshold flag now accepts a decimal between 0.0 and 1.0, no longer 0-100.
  • Improved support for streaming via stdin
  • Ending timestamps are offset by 1

JSON support improves integration with scripts significantly. I have a script below that I am using, which will scan through every video of a channel and download clips of all the high confidence segments automatically (no user interaction is required).

Required programs: ffmpeg jq sound_reader.py yt-dlp.

yt-dlp --flat-playlist --get-url "$@" | while IFS= read -r URL; do
    yt-dlp "$URL" -qxo - |  sound_reader.py --model ~/Projects/adetector/bdetectionmodel_05_01_23.onnx --threshold 0.93 - | jq -r '"\(.start)-\(.end)"' | xargs -I{} yt-dlp "$URL" -f b --force-keyframes-at-cuts --download-sections "*{}" -o "%(id)s_{}"
done

Replace $@ with the URL of the channel to scan (can be from any platform that yt-dlp supports, such as YouTube, Twitch, AfreecaTV, etc.).

This script is especially useful, since it requires absolutely no additional storage space, except for what is required to save the clips. All audio is passed using memory.

This script requires a POSIX compliant shell to run it, such as bash (Linux) or zsh (MacOS).

Here is an usage example, which scans and downloads clips from Alinity’s channel:

yt-dlp --flat-playlist --get-url "https://www.twitch.tv/alinity/videos?filter=all&sort=time" | while IFS= read -r URL; do
    yt-dlp "$URL" -qxo - |  sound_reader.py --model ~/Projects/adetector/bdetectionmodel_05_01_23.onnx --threshold 0.93 - | jq -r '"\(.start)-\(.end)"' | xargs -I{} yt-dlp "$URL" -f b --force-keyframes-at-cuts --download-sections "*{}" -o "%(id)s_{}"
done

Additionally, the script below will quickly concatenate all the videos together into a single video named alinity.mp4 (This requires file to be encoded in a support codec)

find . -type f -name 'v*_*-*' | xargs ffmpeg -i {} -f mpegt
s -c copy - | ffmpeg -i - -c copy alinity.mp4

Here’s the resulting video, done merely by copying a link to a channel, running 2 commands, and waiting patiently.

4 Likes