Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It also feels like people are automating things that don't really need to be automated at all (do you really need to be reminded to make coffee?)

I've posted about this before, I call it the Jarvis effect.

> For years we had people trying to make voice agents, like Iron Man's Jarvis, a thing. You had people super bought into the idea that if you could talk to your computer and say "Jarvis, book me a flight from New York to Hawaii" and it would just do it just like the movies, that was the future, that was sci-fi, it was awesome.

> But it turns out that voice sucks as a user interface. The only time people use voice controls is when they can't use other controls, i.e. while driving. Nobody is voluntarily booking a flight with their Alexa. There's a reason every society on the planet shifted from primarily phone calls to texting once the technology was available!

By and large the reason people love Openclaw is that it feels cool and futuristic. You have an AGENT! It's DOING THINGS! Yes it's doing things you could have easily done yourself, but you're not doing them yourself, you have an AGENT! It's all very silly, the same way that having your lights controlled by your phone is very silly, but some people like it.

That being said there a real use case for Openclaw, which is "marketing" (aka spam). A ton of people have set up Openclaw agents which exist to post on Twitter/Facebook/Discord/any open public user discussion forum (yes, HN included) to seem like a real member of a community, then start advertising something, generally crypto. So we can thank Openclaw for dead internet accelerationism.



Interestingly, your example is an actual thing we used to have.

In 1996, I picked up the phone on my desk, dialed a 3 digit code, said “I need to fly to Los Angeles on Tuesday morning, returning Wednesday evening”. A couple hours later, an envelope appeared in my inbox with plane tickets, rental car reservation and hotel reservation.

Then every company in the world fired all the secretaries over the course of the next few years to cut costs, and we’ve collectively forgotten that it was ever like that.


Your example is a great example because the secretaries are clearly filling in the gaps, such as

1. How much can you spend on this trip? 2. Is first/business class necessary? 3. Is a layover acceptable if it's cheaper? 3a. Is it better to have a 4am flight nonstop or a 7am flight with a layover? 4. Are there preferred airlines? 5. Are there preferred hotel chains? What's the hotel budget? Do you want to pay extra for a nice view? 6. What kind of car should you rent? Is there equipment you'll be handling?

etc...

This is the kind of stuff that's easy(-ish) to communicate by presenting a list of options to a user through an actual interface. It sucks doing it through voice; think of the old phone systems where you had to go through droning "If you would like to rent an SUV, press 1. If you would like to rent a sedan, press 2. To speak to an operator, press 0."

So no, you never had a voice interface for booking flights; you had a human brain to whom you delegated, which is very different.


This is interesting to see from multiple sides. The method (voice/chat/whatever) is one piece but the other is:

You already show many questions on one issue (a ticket).

Would you need to think about those? With the secretary the task is split: you decide you need the ticket, the secretary handles the rest while you focus on what you do.

Presenting all options, getting all callbacks, confirm e-mails etc does not change that. It puts all load on you.


It's still like that if you're high enough on the corporate ladder


if you've got that kind of money, that's absolutely still an option...


What kind of money are you referring to? When I placed that call, I was a junior engineer two years out of school earning $37,000 a year.


i suppose i misunderstood and thought you called the airline and trusted them to give you an okay deal without comparing prices - if the secretary doing the booking had no incentive to upcharge you, that makes more sense i guess

p.s. isn't this https://www.in2013dollars.com/us/inflation/1996?amount=37000 about a third above the us median income? idk i don't live in the states


I’ll disagree with you a little. The reason I don’t use voice is because of context switching.

With a mouse and keyboard I can switch windows.

With my voice, the computer can’t yet automatically determine if I am dictating a transcription or giving editing commands. What I really need is the interpreter listening to me to intuitively to know whether I am in the equivalent of VI command mode or insert mode.

It is the roadblock to not needing a screen at all, right now I want to visualize whether it understood me correctly because if it didn’t switch from insert to command automatically, I now have all my commands written into my paragraph. I also don’t want to listen to the computer talk back to me to confirm it listened. I want to just keep going, to keep narrating my thoughts and trust it’s doing the right things, not having to check. Having it slowly chime in to repeat that it listened derails my flow and train of thought.

TLDR The future of voice is headless vi.


Problem I see here is you're trying to shoehorn a voice interface onto something that's highly optimized for keyboard input. The apps need to be redesigned to be accommodating of the interface, else it's just never-ending papercuts.


That’s what I’m saying. Voice as the input requires a completely new ui paradigm, and chat / chatbot isn’t enough.


Voice input will always be inherently worse than mouse and screen plus keyboard, because voice is linear.

It can only ever be a linear sequence of input

The 2 dimensional field of a screen and a mouse and keyboard give you extreme amounts of input and allow you to contextualize that input in arbitrary ways that intuitively make sense to people with minimal training. Most people do not need to be taught that "Paste" goes to the active window.

We barely even touch the surface of what is possible through this set of input devices and output and yet we can't even get that level of fine grained and reliable control into touch screen devices and gamepads, let alone a linear stream of pitch.

Voice cannot be a robust interface. It isn't between humans. There's immense nonverbal communication and human communication also relies very heavily on preshared context to actually get that info across in the first place. Even with all that machinery, human voice is generally considered to only carry, regardless of language, 44ish bits per second of data.


> (yes, HN included) to seem like a real member of a community, then start advertising something

Ah, so that is indeed the endgame of what I've been seeing, hmm?


They seem to call it "escaping the permanent underclass" on Xitter.


And this is how you get Moltbook.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: