Made a utility to timestamp audio automatically

mamamama · 25 January 2024 12:23

Hey I think there’s an error with the google collab software because it take really long to look for timestamps when scanning for the whole channel here’s the error:

2024-01-25 11:30:38.744651173 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory 2024-01-25 11:30:38.744684287 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:747 CreateExecutionProviderInstance]

Failed to create CUDAExecutionProvider. Please reference NVIDIA - CUDA | onnxruntime to ensure all dependencies are met. 2024-01-25 11:30:40.137830169 [W:onnxruntime:, inference_session.cc:1732 Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.

TC · 25 January 2024 19:07

Google Colab has began including CUDA 12 instead of 11.8, which is what Onnx needs to use the GPU. Right now, Colab can only use the CPU, which is why it has gotten slower.

xdddv · 1 February 2024 16:49

do you know why sometimes the colab inferecing just stops before finishing it, it’s because of cpu usage? i’m a complete boomer when it comes to that

WHEELIEFARTS · 13 February 2024 07:01

Does this work on macOS anybody know?

TC · 17 February 2024 07:00

A small update: https://tctbff.duckdns.org/programs/sound_reader/sound_reader.py

A new f16 model with light improvements to accuracy, along with support for Nvidia Tensor Cores and CPUs with f16c instructions: https://tctbff.duckdns.org/programs/sound_reader/bdetectionmodel_05_01_23_f16.onnx
Fixed issue where audio that did not match the block size would cause the ml kernel to be recompiled, causing a slowdown.
Support for parallel decode with -N

xdddv · 17 February 2024 14:51

colab is broken just for me?

xdddv · 19 February 2024 19:07

TC · 19 February 2024 22:23

This is an issue with Google Colab, specifically.

xdddv · 20 February 2024 00:26

ooh, mb then, but you know what’s causing it?

TC · 20 February 2024 05:26

Probably something wrong with their combination of ONNX and CUDA is causing an error to be printed to stdout, which is polluting the timestamp json (causing the parse error).

xdddv · 20 February 2024 13:21

such a pity, it was working perfectly

xdddv · 22 February 2024 18:29

i fixed it for myself, it worked just switching “onnxruntime-gpu” to “onnxruntime”

NickDeName1 · 7 March 2024 14:32

Hey, I have been running the script on my computer for a few days now, and since today, the inference time per iteration increased like crazy.
It was around 8 sec per iteration in the past few days, against 150 sec per iteration today. I didn’t change anything in the script (except for the block-size, that I put back to its original value after crashing the code).
I also tried downloading the onnx model again, but this doens’t change anything.

Is anyone having the same problem or has any ideas on what could be the issue there ?

TC · 7 March 2024 17:50

I have no indication of what could be going wrong from your message. The script, itself, does not connect to the internet, unless you pass in a video URL. So neither the script nor the model changed between now and when you first downloaded it.

Provide the arguments you used, output, and your system configuration.

NickDeName1 · 7 March 2024 21:13

I am not sure this will really help but here it is : Basically, I am running this command :

yt-dlp "$URL" -qxo - | ./sound_reader.py --threshold 0.30 --model bdetectionmodel_05_01_23_f16.onnx -

and the output shows this :

2024-03-07 21:35:06.2093894 [W:onnxruntime:, inference_session.cc:1914 onnxruntime::InferenceSession::Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.
Inferencing -: 1it [02:41, 161.87s/it]

And about my config :
AMD Ryzen 7 6800H 3.20 GHz
NVIDIA GeForce RTX 3060 Laptop
AMD Radeon™ Graphics

Somehow I could not manage to make the gpu version of onnx to run on my computer, so I have been using the cpu version so far.
And using the exact same command with same url yesterday gave me around 8 seconds per iteration inferencing time instead of the 160 today. The only thing I have changed since the first time I used the script is updating the yt-dlp package.

TC · 8 March 2024 04:10

Everything appears correct in that there are no warnings being produced.

Is there also a slowdown when you try inferencing files that you have saved locally?

I want to say this just the audio being streamed slowly from yt-dlp, since there isn’t really anything else that would make sense to me. Try downloading it first, then inferencing, and let me know if it runs at its normal pace. 8 seconds is already pretty long for your class of GPU, so it’s probably bottlenecked by yt-dlp

NickDeName1 · 8 March 2024 15:36

Ah you’re right, it seems the inference times gets back to normal if I use it on local files !
I would have thought that the yt-dlp command would be executed independently and then provide the output to the sound_reader script…

I downgraded yt-dlp to the original version and the problem remains, so I guess the issue is independent from my setup and is within yt-dlp itself…

And as I told, I’m not running it on the GPU but on the CPU because I could not manage to make onnxruntime to work.

Anyways, thanks for the help, I guess I will download the files on my own beforehand untill yt-dlp fixes the low speed.

TC · 8 March 2024 16:14

You can actually have yt-dlp use several threads to download audio, with the -N flag. In some cases, this isn’t that helpful (like when YouTube evicts a video from a cache nearby), but some sites limit your bandwidth per connection. So you can setup multiple connections and stream audio way faster in some cases.

Ah, sorry just noticed. I would actually use the regular model instead of the F16 model in your case, as I don’t believe your CPU supports avx512, which is the only set which can perform F16 with simd

thegasprovider · 23 May 2024 01:13

Hey, @TC
If you have the time, could you kindly review the issue I’m having below?

I started getting this error when I tried to run the tool on a video I’d downloaded:

Traceback (most recent call last):
  File "C:\Users\phcla\Downloads\sound_reader.py", line 135, in <module>
    model = onnx.load(args.model)
  File "C:\Users\phcla\AppData\Local\Programs\Python\Python310\lib\site-packages\onnx\__init__.py", line 170, in load_model
    model = load_model_from_string(s, format=format)
  File "C:\Users\phcla\AppData\Local\Programs\Python\Python310\lib\site-packages\onnx\__init__.py", line 212, in load_model_from_string
    return _deserialize(s, ModelProto())
  File "C:\Users\phcla\AppData\Local\Programs\Python\Python310\lib\site-packages\onnx\__init__.py", line 143, in _deserialize
    decoded = typing.cast(Optional[int], proto.ParseFromString(s))
google.protobuf.message.DecodeError: Error parsing message

It seemed like something related to the model, so I redownloaded the .onnx model file you had here, and I was able to run one video. Then, when I tried to run the next one, I started getting the error below:

Traceback (most recent call last):
  File "C:\Users\phcla\Downloads\sound_reader.py", line 136, in <module>
    onnx.checker.check_model(model)
  File "C:\Users\phcla\AppData\Local\Programs\Python\Python310\lib\site-packages\onnx\checker.py", line 136, in check_model
    C.check_model(protobuf_string, full_check)
onnx.onnx_cpp2py_export.checker.ValidationError: No Op registered for MemcpyToHost with domain_version of 17

==> Context: Bad node spec for node. Name: Memcpy_token_155 OpType: MemcpyToHost

I also tried “pip install --upgrade” to see if anything was outdated, but it had no effect

Would you happen to know what could be happening and how could I address this? Thanks in advance!

EDIT: After doing some digging, I found that apparently the Domain isn’t set to the following Nodes. It seems like some ONNX operations don’t require a Domain, and maybe the tool doesn’t even require these Nodes, but if either is the case, then maybe this could be related to the problem.

Node Name: Memcpy_token_155, OpType: MemcpyToHost, Domain:

Node Name: Memcpy_token_154, OpType: MemcpyFromHost, Domain:

Node Name: Memcpy, OpType: MemcpyFromHost, Domain:

I tried creating a copy of the Model for testing and reset the Domain to a random one (com.microsoft) just to see if I got something different, and got error “Fatal error: com.microsoft:MemcpyToHost(-1) is not a registered function/op”. So, I guess maybe these Nodes do need to have an empty Domain.

I also tried deleting them and got a sorting error (“C.check_model(
onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input ‘/Transpose_1_output_0’ of node:
name: /conv_block1/Relu_output_0_nchwc OpType: Conv
is not output of any previous nodes.”), so I guess these Nodes do need to be there, and the problem is something else.

Just felt like sharing, in case it saves you any time when/if you have the time to check this out

TC · 23 May 2024 03:49

Hmm, maybe onnx is out of sync with onnxruntime. I’ve left the checker in because I thought it may be helpful for debugging, but it seems like it’s being overly pedantic in your case. When sound_reader starts up, it performs a kernel search and optimizes the model for the active configuration. This means that a few ops which are specific to your system are added and saved to the file (so that optimization does not need to be redone later).

You can either redownload sound_reader (there are a few subtle changes I made so that timestamp timings are more correct) where the check is removed, or you may comment out the line with the check yourself.

Given that the initial pass worked on your machine, it’s probably some version issue where the onnx package doesn’t recognize some of the ops from onnxruntime