But the way this is working is by looking at the depth buffer and replacing only the person. So you could invert that mask and use this for chroma keying the person into a different environment. Then if you had depth info for the environment shot, you could potentially simulate a light and cast shadows from the person into the environment, and so on.
Long story short I think most Kinect hacks will be theoretically possible with 2d image analysis, but its more the speed of iteration and simplicity of these solutions that's interesting. For a similar scenario look at the way many modern game engines use a "g-buffer" which includes depth and surface normal info to create more realistic effects:
http://en.wikipedia.org/wiki/Deferred_shading
Actually, all that's really necessary is to take a snapshot of the background and overlay whatever is different from the background. 3D data would help in the case of a color match between the actor and the background, but isn't strictly necessary.
It'd be nice to see what it looked like if you changed the shader to take the differential of the Z buffer to calculate surface normals for the person, and use that in the refraction calculations. That way you'd get a 3-d blur effect, rather than the flat silhouette.
+Take a static shot of room
+Record image of the actor via Kinect
+Overlay actor and apply a filter
Not exactly a hack, but just a relatively simple video trick. I'm not even sure what applications there are beyond this video effect.