Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing I don't understand is why everyone is throwing money at LLMs for language, when there are much simpler use cases which are more useful?

For example, has anyone ever attempted image -> html/css model? Seems like it be great if I can draw something on a piece of paper and have it generate a website view for me.



Perhaps if we think of LLMs as search engines (Google, Bing etc) then there's more money to be made by being the top generic search engine than the top specialized one (code search, papers search etc)


This is the real PVP of LLM for me. Compressing google search AND the internet into 8 GB and download is something unfathomable to me a two decades ago.

My hope now is that someone will figure out a way to separate intelligence from knowledge - i.e. train a model that knows how to interpret the wights of other models - so that training new intelligent models wouldn't require training them on a petabyte of data every run.


> has anyone ever attempted image -> html/css model?

I had a discussion with a friend about doing this, but for CNC code. The answer was that a model trained on a narrow data set underperforms one trained on a large data set and then fine tuned with the narrow one.


All of the multi-modal LLMs are reasonably good at this.


They did that in the chatgpt 4 demo 1.5 year ago. https://www.youtube.com/watch?v=GylMu1wF9hw


I was under the impression that you could more or less do something like that with the existing LLMs?

(May work poorly of course, and the sample I think I saw a year ago may well be cherry picked)


>For example, has anyone ever attempted image -> html/css model?

Have you tried upload the image to a LLM with vision capabilities like GPT-4o or Claude 3.5 Sonnet?


I tried and sonnet 3.5 can copy most of common UIs


> For example, has anyone ever attempted image -> html/css model?

There are already companies selling services where they generate entire frontend applications from vague natural language inputs.

https://vercel.com/blog/announcing-v0-generative-ui


Not sure why you think interpreting a hand drawing is "simpler" than parsing sequential text.


That's a thought I had. For example, could a model be trained to take a description, and create a Blender (or whatever other software) model from it? I have no idea how LLMs really work under the hood, so please tell me if this is nonsense.


I'm waiting exactly for this, gpt4 trips up a lot with blender currently (nonsensical order of operations etc.)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: