I would like to share my script that helps me create compilations.
Here’s how the script works:
- I write YouTube video links in the
data.txt
file in the required format and run the code. - The script downloads the videos, detects timestamps, saves a compilation of the video, and then deletes the auxiliary files before moving on to the next video.
The idea to create my own script came to me after I saw a couple of similar projects on a forum and faced some difficulties with them.
I’m aware that there is already a website that performs such tasks, but I haven’t tried it due to payment issues.
I also saw a GUI-based program, but my script turned out to be more precise and faster. Moreover, I haven’t come across any open-source programs for this purpose on the forum.
For an overview of the script, I created a Google Colab notebook (it includes an example of how it works). You can check it out via this link: Google Colab Notebook.
Please read the description carefully and follow the steps in order.
I personally use Jupyter Notebook installed on my computer. This is because in Jupyter Notebook, the script (specifically the analysis and compilation process) runs faster, though this depends on your computer’s specifications.
To run the script locally like I do, you need to:
- Install Python.
- Install the required libraries specified in the script.
- Install Jupyter Notebook.
- Add the necessary files (the same ones you would upload to Google Colab).
- Copy and run the code.
For installing Jupyter Notebook, I recommend looking up instructions online; that’s how I managed to do it.
My script is based on the pre-trained YAMNet model (you can read more about it on GitHub; it’s open-source) and a few additional modules. This allows the algorithm to detect timestamps quite accurately (this parameter is adjustable), sometimes finding things I wouldn’t have noticed myself.
I use a threshold setting of 0.02
, which works perfectly for me, along with 2 seconds before and 3 seconds after the detected timestamp. This also suits my needs. Of course, I further refine the resulting compilations in Sony Vegas, as software tools aren’t yet perfect.
Downsides of the script:
- With a large volume of videos, after some time (for me, around 1–2 hours), it’s necessary to refresh the cookies files to confirm you’re not a bot. After that, processed links are removed, and the algorithm resumes.
- A rare error occurs if a video link starts with a - (minus) sign. This happens approximately once every 20 hours.
- The algorithm sometimes reacts to specific sounds, such as the clatter of a bottle (though not always) or a loud, bassy voice (possibly because it resembles a burp). However, such false positives are not frequent enough to bother me. Besides, I don’t have access to a model better than YAMNet, which is trained on a larger dataset like AudioSet.
- For videos longer than 20 minutes, the algorithm struggles. However, for shorter videos, it works perfectly—for example, analyzing a 10-minute video takes less than a minute. I suspect this is due to my PC’s specs (GTX 1060, Ryzen 1500X, and 16GB RAM). For videos over 20 minutes, I split them into smaller files, although I rarely process such long videos.
Additional features of the script:
- Since downloads are handled by the
yt_dlp
module, it can download videos not only from YouTube but also from similar platforms (I’ve tested it with one messenger, and it worked). - If the videos are already stored on your PC, you can modify the code slightly to process them. You’ll just need to extract the audio from the videos and reformat it as required by the script (single-channel WAV format, among other specifics listed in the audio loader function).
- The YAMNet model is quite versatile. If you’re interested in detecting farts, for instance, you can simply change two numbers in the code. I recommend checking the
.csv
table file for supported sounds.
Suggestions for improving the script:
- While I’m not a professional programmer, I’m satisfied with the current script. However, I’d be happy to hear suggestions from experienced developers on how to improve it.
- I think the process of video creation could be optimized by utilizing NVIDIA hardware acceleration.
I hope I’ve explained my script clearly enough. Feel free to ask any questions! (I also hope Google Colab works fine, as transferring the code there was a bit challenging for me.)