DeepFilterNet: Noise supression using deep filtering

orbital-decay · on June 7, 2023

Frankly, what I hear is very similar to the results of classic spectral denoising, even with the characteristic FFT artifacts (for Linux, there's Noise Repellent [1] available for advanced spectral denoising; there's also a ton of commercial spectral processors available).

The demonstration could use more random background noises to separate it from FFT noise suppressors (as it's the primary benefit of ML-based filters), and more varied speech to separate it from RNNoise [2] which tends to suppress breath and cut the sibilants in an unnatural manner. The latency is also important - is it as low as in RNNoise? What about the CPU load?

[1] https://github.com/lucianodato/noise-repellent

[2] https://github.com/werman/noise-suppression-for-voice

GrayShade · on June 7, 2023

Did you see the samples from https://rikorose.github.io/DeepFilterNet2-Samples/?

nitinreddy88 · on June 7, 2023

Integrate with Pipewire: https://github.com/Rikorose/DeepFilterNet/blob/main/ladspa/R...

Youtube demo: https://youtu.be/EO7n96YwnyE

Paper Explanation: https://youtu.be/it90gBqkY6k

rektide · on June 7, 2023

It's so excellent how many moats are just getting obliterated.

I absolutely have been a real snarky hater against AI, as a horrible fuedal unobservable black box that has way too much power in the world. But open source has been doing amazing at reading the papers & reproducing & it's glorious to see.

Amazing examples of a peership culture in action. Rising each other up is so divine. Share the knowledge & means.

philipkglass · on June 7, 2023

I recently replaced an image classification pipeline that leaned heavily on classical computer vision techniques (like you'd find in OpenCV) with a neural network based approach using open models. There were about 4 years of developer effort invested in that old pipeline and I got better results with 2 months of effort invested in the new NN based system.

Later this year I plan on revamping an old NLP system with even more man-years of effort invested in it. I think I can beat it with neural networks too. The main reason I haven't started already is that open language model progress is so fast that I expect significantly better building blocks in 4 months. Using these new tools feels magical, particularly when you have experienced how much effort it took to get half-as-good results with older techniques.

chaxor · on June 7, 2023

To quote Lukaz Kaiser in his famous talk on 2017: "stupidity is all you need"

Fordec · on June 7, 2023

Any NN tips? I have my legacy pipelines, but haven't put the effort to migrate yet even though I know I should.

philipkglass · on June 7, 2023

The ML/DL software ecosystem is dominated by Python. Python's dependency management can be especially tricky, so try to limit the dependencies you're pulling in for the final deployed artifact if your final artifact is also Python based.

CUDA can be difficult to set up correctly on your personal development machine and if you rely on it you're also limiting the development machines that other people can use. You're limiting the deployment options and the CI options. This may differ if you work at a larger company that has a team specializing in these things, but I had to work out everything from initial proof-of-concept to final deployment. Some applications absolutely need the higher performance from GPU execution but it's worth seeing if you can get away with CPU-only execution because it avoids operational complications.

I used the ONNX runtime (as this DeepFilterNet project does, indirectly, according to a top level comment by WiSaGaN) and I was able to get adequate inference speed running on plain CPU. It's a small Python service that just wraps the inference logic with a command interface and a connection to Redis. It takes protocol buffer inputs from a Redis based queue, does a little bit of control logic followed by inference, and writes the results as protocol buffers to another Redis based queue.

The only ML libraries I have on the Python side are the ones for ONNX. This saved over a gigabyte (!) of transitive dependencies compared to the first proof-of-concept I had that relied on PyTorch for runtime inference.

Final advice: you actually don't need to know much theory to start doing something useful. I hadn't studied neural networks since graduate school 20 years ago so my theory is hopelessly outdated. I just started hacking together a little demo for myself and it was good enough that I was encouraged to take it all the way to production.

aurbano · on June 7, 2023

We recently migrated to Poetry [1] for dependency management and so far it's been a breath of fresh air - it feels like what Python deps should've always been!

You can even have dependency groups, to separate main/dev dependencies for instance. It also brings env management, and plays very nicely with Docker if you use containers.

[1] https://python-poetry.org

rektide · on June 7, 2023

Poetry doesn't really work well if you try to ssh in. It tries to setup a keyring for some reason whenever you use it, but even if just to Runa project. I feel like it shouldn't be that hard to get going, but after trying to hack it for four hour I used a workaround of telling the keyrings to use a null provider & then was able to download deps & run the project I'd downloaded.

Admittedly only a single nit but it was still one of thr saddest most frustrating python experiences I've ever had.

KRAKRISMOTT · on June 7, 2023

Good luck using poetry for GPU accelerated ML dependencies

nurettin · on June 8, 2023

poetry add maps to pip install, and you can get cuda based modules with pip, so what's the problem?

KRAKRISMOTT · on June 8, 2023

It doesn't support the -f option of Pip, which is non-standard but used by many ML dependencies.

nurettin · on June 8, 2023

you mean like this?

    wtf = { file = "omg.whl" }

KRAKRISMOTT · on June 9, 2023

It's not reasonable to expect the end user to figure out the correct toolchain triple URL there are so many os-driver-framework version combinations.

https://download.pytorch.org/whl/torch/

It's meant to be an automated process, don't make excuses for poor implementation on behalf of Poetry.

See for yourself the number of torch related issues:

https://github.com/python-poetry/poetry/issues?q=is%3Aissue+...

Not a week goes by without a new issue popping up. A dependency manager is a critical piece of infrastructure, it should not be the main developer experience bottleneck when building an application. A good package manager is out of sight, out of mind. You don't hear people complaining so much about Cargo every day.

nurettin · on June 15, 2023

I don't really feel like defending poetry, I don't even use it. But in this particular case, I think the wheel format and the way people try to host it on their servers and the entitled researchers who act like 80 year olds and "just want things done" instead of taking a little bit of time to learn to work with their tools are to blame. None of this exists with the cargo infra.

lettergram · on June 7, 2023

Isolate the prior logic into functional components. Find the inputs and outputs to each box. Identify which component ML can replace. Replace one after the next, sometimes merging if the ML can do it all on the gpu

chintler · on June 7, 2023

I've modernized a lot of legacy pipelines and can help you if you want.

atoav · on June 7, 2023

As someone who worked professionally using top of the line audio denoising for speech in cinema productions I have to say that I am underwhelmed by the results. This is very similar to what traditional algorithms would have achieved a decade ago.

Of course there might be potential for improvement there, maybe it is more performant or was developed way faster etc. But just listening to the audible result is not too cinvincing — yet.

est31 · on June 7, 2023

Note that this isn't just a paper reproduction but a new paper in and of itself (the github is by one of the paper's authors). This is unbelievably amazing. I wonder how it compares to rnnoise, which is also open source and also targets real time settings.

WiSaGaN · on June 7, 2023

It looks like the library in Rust is using `tract-onnx` to do the inference: https://github.com/Rikorose/DeepFilterNet/blob/2a84d2a1750a5... I am wondering whether using Python for research, training in big data center, and Rust at edge for efficient inference would be a trend in the future. We do have a larger community of C++ right now for inference (e.g. ggml). But Rust crate as component to build applications of AI is joy to use.

ikhatri · on June 7, 2023

You can use the onnx cpu runtime in python or c++ too. It doesn’t have to be rust. And if you want GPU support you can even run models saved in the onnx format on Nvidia GPUs with the TensorRT runtime.

Honestly while ggml is super cool. It started as a hobby project and you probably shouldn’t use it in production. ONNX has been the defacto standard for ML inference for years. What it is missing (compared to ggml) is 2-6bit inference which is helpful for large scale transformers on edge devices (and is what helped ggml gain adoption so fast).

touisteur · on June 7, 2023

Intel OpenVINO is also quite punchy for CPU inference.

ikhatri · on June 7, 2023

Yeah I've heard of it but never used it. Looks like they have a backend/runtime for ONNX models as well (https://pypi.org/project/onnxruntime-openvino/) neat!

ONNX really is the universal format. If you can get your model exported to ONNX, running it on various platforms becomes much easier.*

*as long as every hardware platform supports the ops you use in your network and you're not doing anything too fancy/custom :P

touisteur · on June 7, 2023

Yeah I've only used it with networks in ONNX format (converted from tensorflow or torch). I was looking for high perf low latency / real-time, the C or C++ APIs for OpenVINO are quite OK if you spend some time playing with it. I hope Intel keeps investing on it...

Edit: often if you go through the ONNX intermediate format, be prepared to perform some 'network surgery' to clean up some conversion cruft, but also to remove training-only stuff left in the network...

narrationbox · on June 7, 2023

Since it does the signal processing in the Fourier domain, does this suffer from audio artefacts e.g. hissing in the output? Torch's inverse STFT uses Griffin-Lim which is probabilistic and if you don't train it sufficiently, you may sometimes get noise in the output.

https://pytorch.org/docs/stable/generated/torch.istft.html#t...

An alternative would be to use a vocoder network (or just target a neural speech codec like SoundStream).

thatsadude · on June 7, 2023

Not all spectral methods have such artifact. The type of artifacts you mention happens when you need to do phase retrieval or try to reconstruct waveforms from melspectrogram. Deepfilternet does spectral masking on the complex spectrogram so there is no need for phase retrieval.

ZoomZoomZoom · on June 7, 2023

The demo with the vac is certainly not a success.

I sometimes wonder if all those filters optimise for a wrong thing. Removing noise is meaningless, unless the overall discernability improves. If you remove noise with the price of the voice becoming choppy, "robotic" and unnatural, you didn't improve the situation, and in some cases you can say only made it worse.

What even further deteriorates legibility for most noise suppression filters is the discrepancy between the completely dry pauses and the remaining ambiance "under" the voice. It would be much more interesting to see some style transfer for voice ambience as an alternative to current de-verbs.

When dealing with voice processing I advocate for restraining from noise suppression filters for as long as possible, and I haven't seen a publicly available noise suppression filter which could change my position yet.

mdp2021 · on June 7, 2023

> Removing noise is meaningless, unless the overall discernability improves

It is certainly one possibility of function automators - guessing detail. Training from degradation to original.

sniglom · on June 7, 2023

Sorry if hijacking,

I have a drone (DJI FPV) with a microphone. It can pick up some sounds, but the loudness from the rotors makes it really hard to hear in playback.

The rotor noise varies in frequency and has several harmonics as well, so it can't be band passed.

I understand that you can't get a clean or great signal from it, but something would be nice.

What tool would be good to use to filter out that noise?

DanTremonti · on June 11, 2023

If you want to filter out the rotor noise alone, switching to a directional microphone would be the best option IMO.

pwython · on June 7, 2023

Aside from the buzzing rotors, wouldn't wind noise be a big issue too? When recording your flights, what noises do you want it to pick up?

stonelazy · on June 7, 2023

One of the challenges we face in this research problem is the lack of reliable metric to evaluate the quality of the NN model. In recent times, i came to know of [3Quest metric](https://cdn.head-acoustics.com/fileadmin/data/global/Datashe...) being helpful in this regard. Anybody have any experience with this metric ? May be in comparison with Microsoft's DNSMOS ?

stan_kirdey · on June 7, 2023

https://github.com/haoheliu/voicefixer is also a nice CLI tool to do general speech restoration

Demo page: https://haoheliu.github.io/demopage-voicefixer/

boneitis · on June 7, 2023

If you (especially on behalf of any hip, popular platforms like Discord) undertake any projects to aggressively denoise or compress audio, please (PLEASE) do us people with auditory processing difficulties a favor, and include such people in your testing.

I beg of you, with utmost sincerity.

mrtranscendence · on June 7, 2023

What’s wrong with the noise suppression offered by Discord? I use it for work meetings as well (via the Krisp app) but I’d hate to cause anyone distress.

boneitis · on June 7, 2023

Most of these codec/signal processing projects tend to strike a nerve for me due to the ignorance of how unintelligible the processed output is (as someone who has a very difficult time with this and very regularly has to ask people to repeat themselves 4-5x even in meatspace interaction), so from the particular angle my outburst came from, it's admittedly being unfair to Discord.

However, I find the aggressiveness of these things to still be a problem.

In Discord's case, I can't wrap my head around how people find those tearing/crunching sounds that result from trying to smooth out keyboard strokes desirable; particularly because it largely washes out the speaker's output feed and sounds really bizarre and out of place. I would rather just hear their keystrokes.

If you want to smooth out the really acute/jarring occurrences a bit: fine. But as it is, people seem to want to suppress/compress to the point of throwing the baby out with the bathwater. I chalk it up to their desire to squeeze every bit of bit-compression for their marketing/employment/performance metrics. my 2c. /shrug

chpatrick · on June 7, 2023

How does it compare to rnnoise?

thatsadude · on June 7, 2023

Much much better, I really recommend DeepFilterNet, it's the most well-rounded open-source AI noise suppression tool out there. Big caveat, it won't help ASR model, e.g., Whisper.

Source: I work in this research area.

peepwaah · on June 7, 2023

Hey thatsadude, thanks for your input! I'm also working in this research area and it would be great to connect with you on LinkedIn. Here's my proxy LinkedIn Profile - https://bit.ly/3ChXFcm. If you're interested, could you send me a connection request? Looking forward to connecting and discussing more in the future!

SergeAx · on June 7, 2023

Do you know a framework to quantitatively compare noise suppression quality produced by different algos? Or maybe there is a industry-standard test suite?

thatsadude · on June 7, 2023

The golden standard is ITU-T P.808 subject test https://github.com/microsoft/P.808 Of course, running a subjective test is expensive, and hence, there are objective scores such as DNSMOS and UTMOS which use neural networks to predict P808 values.

SergeAx · on June 7, 2023

Thank you so much!

MacsHeadroom · on June 7, 2023

> Big caveat, it won't help ASR model, e.g., Whisper.

Why not?

thatsadude · on June 7, 2023

Few reasons:

1. Speech distortion is extremely detrimental to ASR models while human listeners may not be able to notice. Noise reduction models such as RNNNoise and DeepFilterNet try to reduce "perceptible" noises. And doing that will create imperceptible distortion which ASR models does not like at all.

2. Many noise reduction models apply on raw spectrograms or ERB band such as RNNNoise and DeepFilterNet (equivalent rectangular bands). On the other hands, ASR models mostly run on melspectrograms. This mismatch tends to create problem. I have seen many papers from Google claimed that reducing noise in melspectrogram often helps their keyword spotting.