Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recently showed some videos of Soli in the HCI class I teach. Students immediately hit upon the two major issues I wanted to discuss (I was pretty proud!).

The first is learnability. A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.

The second is what's known as the Midas touch problem. How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing? The example I used was the new Mercedes cars that have gesture recognition. While I was doing a test drive, the salesperson started waving his hands as part of his normal speech, and that accidentally raised the volume. Odds are very high Soli will have the same problem. One possibility is to activate Soli via a button, but that would defeat a lot of the purpose of gestures. Another is to use speech to activate, which might work out. Yet another possibility is that you have to do a special gesture "hotword", sort of like how Alexa is activated by saying it's name.

At any rate, these problems are not insurmountable, but it definitely adds to the learning curve, reliability, and overall utility of these gesture based interfaces.



As predicted in the Hitchhiker's Guide to the Galaxy: "...an electric pencil flew across the cabin and through the radio's on/off-sensitive airspace."


It certainly seems on-point.

"A loud clatter of gunk music flooded through the Heart of Gold cabin as Zaphod searched the sub-etha radio wavebands for news of himself. The machine was rather difficult to operate. For years radios had been operated by means of pressing buttons and turning dials; then as the technology became more sophisticated the controls were made touch-sensitive - you merely had to brush the panels with your fingers; now all you had to do was wave your hand in the general direction of the components and hope. It saved a lot of muscular expenditure of course, but meant that you had to sit infuriatingly still if you wanted to keep listening to the same programme."

This is from 1979 of course.


That's uncannily accurate.


It almost seems like cheating to draw on Douglas Adams, of course he'd be the one to draw out futuristic truths in his droll way.


Well, but Trillian did that on purpose specifically to turn off the radio.


Perhaps the computer is smart enough to determine intent. To paraphrase Marvin, "Here I am with a brain the size of a planet and they ask me to determine whether you were gesturing at me on purpose."


Sirius Cybernetics clearly had some ideas along those lines, but the results were lacking:

"He had found a Nutri-Matic machine which had provided him with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea. The way it functioned was very interesting. When the Drink button was pressed it made an instant but highly detailed examination of the subject's taste buds, a spectroscopic examination of the subject's metabolism and then sent tiny experimental signals down the neural pathways to the taste centers of the subject's brain to see what was likely to go down well. However, no one knew quite why it did this because it invariably delivered a cupful of liquid that was almost, but not quite, entirely unlike tea."


Fair point.


The second one seems more of a technical one and can be solved if Soli can reliably recognize user attention, which can effectively be a "hotword" for gesture. This is hard and not sure even it's feasible with this tech, but given all the excitements in this thread on potential privacy issues I guess it's doable :D

The first one seems more troublesome. This is less intuitive than touch screen based interface. The only way I see fighting against this is to standardize a set of generic gestures, map onto existing equivalent touch/voice actions and push it to the Android ecosystem. But not sure how many third party manufacturers will join this parade. Does this technology work well under screen? The industry is now obsessed with getting rid of notch and if Soli blocks this path then it will be a pretty hopeless fight.


If you can use a hot-word, what's wrong with using voice recognition to achieve what you want to do anyway? Using voice takes less effort.


Absolutely, you can use the same utterances to invoke the same intents in a car from a home setting.

"Alexa, set temperature to <x> degrees"

"Alexa, set volume to <x> or increase/decrease volume"


sometimes voice is less appropriate. I would much rather use a gesture than voice command in a library or at work


Snapping my fingers would be a nice trigger, like "ok Google" or "Alexa". Synchronising the sound with the gesture would cut down on the false positive rate, and it's something I'm unlikely to do unless I want to interact with my phone. If it could penetrate my pants pocket, being able to snap my fingers next to my pocket, and then perform simple interactions without having to pick up my phone would be nice. Pick up, hang up, volume etc


I would say that snapping is definitely an incidental gesture for some people, and it's also highly inaccessible (while many gesture controls aren't perfectly accessible, audibly snapping is difficult for many more people than those who waving is difficult for)


Thinking over this I have to agree.

Not to mention, half the utility of the gestures is the ability to interact with messy/wet hands. Snapping my fingers near my phone in that situation isn't attractive.

Maybe teaching a gesture to your phone is the most accessable option, respects culture and disability the best.

It's a shame though, I did like the intentionality that the sound of snapping fingers afforded.


Not to mention how annoying it is to start hearing people snapping their fingers everywhere, like in the office or on any mass transit.


> A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.

For the Google Pixel 4 that they are using in the video you already have a big display. It can instruct you how to gesture so that you learn it and later it can let you gesture without instructions.

> The second is what's known as the Midas touch problem. How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing?

Either an activation word like you said, or it could use the front-side camera to see whether or not you are looking at it.


Or, depending how smart it is, and its range, it might detect your head attitude and use that as a proxy for attention. The website claims that it can detect a turn toward, a lean, or a look.


> A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback > ...learnability

Do you have any examples of well structured learnable systems? I have struggled to find much of anything in this space, yet every technology release I see wants for it.

Here are my two examples, I have no others off the top of my mind. I am more impressed with the vim example.

1. `vim-pandoc-syntax` has a set of documents exampling the feature-set of markdown. These documents are the system they document. Here is one file in a directory of 10 such documents.

https://github.com/vim-pandoc/vim-pandoc-syntax/blob/master/...

2. The KDE shortcuts manager, which lets you see what's bound and bind new things:

https://docs.kde.org/trunk5/en/applications/fundamentals/sho...

I have yet to hear a good response to this question.

I have a Pixel 3 and I want a manual for the device, it appears one does not exist. Nor does documentation. My headphones which came in the box don't resume the most recent media player when I tap the middle button, I called support and over the course of an hour they found they have the same issue. Before my call the people I spoke to said I was wrong and didn't know this issue existed, aftewards they had no advice for me other than to give up. My issue persists.


>>The first is learnability. A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.

That's the same reason for why I think voice controls are literally the worst way to interact with a computer ever(although I think this might actually top it).


> How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing?

This is why I had to change my Amazon Echo Dot's call word back from "Computer". Turns out one might say "computer" a lot during the course of the day, and Alexa was CONSTANTLY going off when it shouldn't have. It was so disappointing that I gave the echo dot away.


Try watching star trek the next generation with that setting on.


Watch the keynote for more information on accidental gestures. They cater for it


There's a little on accidental input detection in this short video too: https://www.youtube.com/watch?v=QS8SW-ouM5w


Off topic but does it seem like this link deliberately doesn't load any of the Youtube UI details, just leaving in grey hints? I thought the rest hadn't loaded but it's kind of a nice experience.

It looks like this on my end: https://imgur.com/F65dvgX


It's a bug. Youtube can't load those components so it displays placeholder UI.


Once we figure out (non-invasive) BCI and EEG type brain activity signature patterns for when our brains process our perceived intent of taking an action and can activate that action on the system side, prior to our brain sending those electrical impulses to our motor system.

How hard would it be to teach ourselves to inhibit the electrical impulses to our motor system when BCI can identify intent?

When would this level of BCI be possible if you had to make an educated guess?

Thanks for sharing, as a fellow HCI/Cog Sci graduate!


This is coming dangerously close to what is commonly known as "reading your mind" and I'm terrified at what this means e.g. with law enforcement.


Seems like being able to detect eye attention would solve the midas problem -- if you aren't looking at it, it doesn't do anything?


That kind of defeats the benefit, doesn't it?

The nicest thing about physical buttons in say a car is that I don't have to look at it to know what I'm doing.


That would be great, but can radar really tell what you are looking at? I suppose could combine it with a camera but that sounds less than ideal in terms of energy use.

(I know very little about any of this)


Didn't students ask about health affects? If not, consider me as a student and ask what affects it have on hand health with prolonged exposure at a closer proximity in your shirt or pant pocket.


spot on. I feel this is an abuse of technology. They want to take Touch to next level with gesture, but it is doomed to fail unless they solve other issues as you pointed out(just my opinion). Gesture might be good for gaming (ex:kinect). I worked on Hover touch in one of the big smartphone company. we achieved good results at different heights but eventually it didn't take off. After all Humans need a sense of touch to interact.


Have a squeeze controlled trigger for input mode in your non-dominant hand?


Reminds me of eye-controlled autofocus from early Canon cameras.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: