Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This obviously works when the image is "scaled" by sampling/nearest-neighbor (e.g. downscaling 2x by taking every second pixel and discarding the rest), not actually scaled through some better method (by doing math that involves all pixel values).

What the article doesn't mention, and the paper it links to probably mentions somewhere alongside so much irrelevant information that I couldn't find it yet, is whether this also works on some of the better scaling algorithms, and thus whether it's a "duh, OBVIOUSLY" or actually interesting research.

The blog post gives a cv2.resize example which seems to default to "bilinear", but I'm not sure what this means for downscaling, in particular for downscaling by a large factor.

I suspect that the key takeaway is "default downscaling methods are bad".



You have to use AREA interpolation for downscaling. Bilinear will only interpolate among the 4 nearest source image pixels. It still ignores most of the source pixels.

This is in essence a special version of sampling artifacts, aliasing artifacts. Anyone writing image processing software should already know about aliasing, the Nyquist theorem etc. Or, well, perhaps not in the current hype, where everyone is a computer vision expert who took one Keras tutorial...

Resizing with nearest neighbor or bilinear (ie ignoring aliasing) also hurts ML accuracy, so they better fix it even regardless of this specific "attack".


Bilinear could mean downscaling with a triangle kernel, but it might well be the standard bilinear interpolation that's native to most GPUs and OSs.

Also area interpolation still has some pretty terrible aliasing, since box kernels are terrible at filtering high frequencies.

And of course with downscaling you could still freely manipulate the downscaled image if you're allowed to use ridiculously high or low values, provided you knew the exact kernel used.


Bilinear uses the triangular kernel over the source image (with size corresponding to the input pixel size).

Area interp works very well in practice, it's more sophisticated than just a box filter on the input and sampling. It calculates the exact intersecting footprint sizes and computes a weighted average. Do you have examples where this causes aliasing and can show a better alternative?


You can use any image with a high frequency regular pattern. Wikipedia has the following example: https://en.wikipedia.org/wiki/File:Moire_pattern_of_bricks_s....

Anything softer than area will help with those kind of issues (which is why the original https://en.wikipedia.org/wiki/Aliasing#/media/File:Moire_pat..., looks fine in most browsers even if your resize it). Bicubic tends to do better in this respect. It's a trade off though.


Sorry, but this is wrong. Area has no aliasing, all others introduce aliasing artifacts when DOWNscaling.

https://imgur.com/a/C6utkwr

Now you could use pre-smoothing with a kernel and then resampling, but then we are talking about something else.

It's important to understand that interpolation happens in the source pixels, so it does not help when downscaling. Cubic tends to look nice, yes, but only when UPscaling.


Yeah if you're going to be using interpolation to downscale it's obviously going to look worse than even the most basic version of downscaling. That's why downscaling uses the transpose of the the interpolation kernel, not doing that and being surprised the result doesn't look good is just silly.


Do you know of any image processing library that has an implementation for that?


Imagemagick should work. It also has quite an extensive documentation: https://legacy.imagemagick.org/Usage/resize/. Though it's a bit hard to know where to start. I'm fairly certain it'll tell you somewhere that interpolation and downscaling use their kernels differently, but I couldn't tell you where.


There's another way to hide the image, and that is to exploit the nonlinearity of the response curves (gamma).

I have an image I crafted a long time ago which looks something like gray noise when you open it up, but when you downscale it, you see an image of Lt Cmdr Data from Star Trek. I wonder if I can dig it up.

The technique itself was not novel when I did it, a more sophisticated version involving embedded gamma values (which you can make quite large or small) was routinely used on image boards some ten or fifteen years ago.


It's ridiculous that so few websites actually handle this well. Even my own self-written imgur clone does it just fine:

https://i.k8r.eu/i/F_XCMA

https://i.k8r.eu/F_XCMAm.png

https://i.k8r.eu/F_XCMAt.png

You just have to go into a linear colorspace and use an area filter.


Related, You can get an idea of what your browser display is doing in this shadertoy: https://www.shadertoy.com/view/Wd2yRt


Fwiw, the reason why wikipedis doesn't do this when rescaling images (or at least didn't years ago when i was working on image resizing code for wikipedia) is that to do that (with off the shelf software) required keeping the entire image in memory, which was a big no no. I mean, i guess it would be fine for small images, but then you're using two different algorithms depending on image size, which seems bad.



The article links to this browser test page:

http://www.ericbrasseur.org/gamma_dalai_lama.html

On my machine, both Firefox and Chrome display grey rectangles when scaling down. Why do the browsers get this wrong?


Because resizing in a linear colorspace is more costly. JPEG can be resized without shifting colorspaces VERY cheaply, but requires loading into RAM if a change in colorspace (or gamma shift) is performed. The hit can be quite significant. On a phone or laptop it would hurt battery, on an online service (dynamic resizer service) it would impact latency.


> on an online service (dynamic resizer service) it would impact latency.

If its even possible at all. Sometimes users upload things like https://commons.wikimedia.org/wiki/File:“Declaration_of_vict...


Can also depend on the monitor? When I drag this page between monitors I see different effects.


Max Pooling could also be targeted extremely easily with this technique, and it is immensely popular as a scale reduction technique in convolutional neural networks. So, yes, it could very well be a relevant and non-trivial attack in the context of 'dataset poisoning'. (it would also be relatively easy to defend against; just don't use max-pooling in the first layer -- but the point is this is a steganographic attack).


One key thing to be aware of is that not all "bilinear" scaling algorithms are created equal. If the "bilinear" in question is GPU-accelerated, it's quite possible that it's the Direct3D/OpenGL bilinear filter, which samples exactly 4 taps of the image from the highest appropriate mip level (which may be the only one, unless the application goes out of its way to generate more). That means if the scaling ratio is less than 50%, it becomes something like a smoothed nearest neighbor filter and is vulnerable to this attack.

The introduction of a mip chain + enabling mip mapping mitigates this, because when the scaling ratio is less than 50% the GPU's texture units will select lower mips to sample from, approximating a "correct" bilinear filter. This does also require generating mips with an appropriate algorithm - there are varying approaches to this, so I suspect it is possible to create attacks against mip chain generation as well.

Thankfully, quality-focused rendering libraries are generally not vulnerable to this, because users demand high-quality filtering. A high-quality bilinear filter will use various measures to ensure that it samples an appropriate number of points in order to provide a smooth result that matches expectations.

One other potential attack against applications relying on the GPU to filter textures is that if you can manually provide mip map data, you can use that to hide alternate texture data or otherwise manipulate the result of downscaling. As far as I know the only common formats that allow providing mip data are DDS and Basis, and DDS support in most software is nonexistent. Basis is an increasingly relevant format though and could potentially be a threat, but as a lossy format it poses unique challenges.


> This does also require generating mips with an appropriate algorithm - there are varying approaches to this

http://number-none.com/product/Mipmapping,%20Part%201/index....

http://number-none.com/product/Mipmapping,%20Part%202/index....


Bilinear and trilinear with mipmap is still relatively poor. 3D also use anisotropic filtering, that eliminates a lot of artifacts, even in 2D scenarios.


It is a very common, and often overlooked issue in image processing. Bilinear is widely used, and not particularly good. For large factor downscaling it is reminiscent of nearest pixel.


> It is a very common (...)

Bilinear interpolation is perfectly acceptable for zooming-in an image (making it larger by adding new pixel values). If you want to zoom-out, you have can still use bilinear interpolation, but of course you have to filter the image data beforehand to avoid aliasing.


Most often scaling and filtering is an integrated process, when one says bilinear it is usually implied that it is combined with nothing else.


Indeed. If you filter the image data, you should _not_ do bilinear on top of that, since bilinear is a box filter, so you'd soften the image for no good reason.


You still need some kind of interpolation if the zoom factor is non-integer, and bilinear is a good choice in that case.


Yeah, the default implementation should check the scaling factor and use AREA interpolation when downscaling and bilinear for upscaling.


Whether it works or not depends on how many samples are used to downscale. Amusingly, this attack was used for bait-and-switch and “click here to [x]” gimmicks on some websites, especially 4chan, and you can find examples tuned primarily for typical thumbnail generators (which, probably for performance reasons, tend to only sample a small number of pixels.)

https://thume.ca/projects/2012/11/14/magic-png-files/


You're looking for section 3.1 in [1] where they analyze the effect of scaling width and kernel size for any abitrary downscaling kernel.

> Any algorithm is vulnerable to image-scaling attacks if the ratior of pixels with highweight is small enough.

1 - https://www.usenix.org/system/files/sec20-quiring.pdf


Just a quick thought: If you just average the surrounding pixels, you could possibly still add occasional pixels to skew the average and create a different image, though that may be much more noticeable.


If you add occasional pixels to skew the average then probably it will be noticeable in the original image. But the interpolation scheme that uses only the four corners while ignoring the rest can be easily fooled. You can blend an entire lower resolution image in the four corner.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: