Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Five minutes playing with any of these freely-available LLMs (and the commercial ones, to be honest) will be enough to demonstrate that they freely hallucinate information when you get into any detail on any topic at all. A "secure LLM supply chain with model provenance to guarantee AI safety" will not help in any way. The models in their current form are simply not suitable for education.


Obviously the models will improve. Then you’re going to want this stuff. What’s the harm in starting now?


Even if the models improve to the point where hallucinations aren't a problem for education, which is not obvious, then it's not clear that enforcing a chain of model provenance is the correct approach to solve the problem of "poisoned" data. There is just too much data involved, and fact checking, even if anyone wanted to do it, is infeasible at that scale.

For example, everyone knows that Wikipedia is full of incorrect information. Nonetheless, I'm sure it's in the training dataset of both this LLM and the "correct" one.

So the answer to "why not start now" is "because it seems like it will be a waste of time".


Per https://en.wikipedia.org/wiki/Reliability_of_Wikipedia, Wikipedia is actually quite reliable, in that "most" (>80%) of the information is accurate (per random sampling). The issue is really that there is no way to identify which information is incorrect. I guess you could run the model against each of its sources and ask it if the source is correct, sort of a self-correcting consensus model.


I'm generally pretty pro-Wikipedia and tend to think a lot of the concerns (at least on the English version) are somewhat overblown, but citing it as a source on its own reliability is just a bit too much even for me. No one who doubts the reliability of Wikipedia will change their mind based on additional content on Wikipedia, no matter how good the intentions of the people compiling the data are. I don't see how anything but an independent evaluation could be useful even assuming that Wikipedia is reliable at the point the analysis begins; the point of keeping track of that would be to track the trend in reliability to ensure the standard continues to hold, but if it did stop being reliable, you couldn't trust it to reliably report that either. I think there's value in presenting a list of claims (e.g. "we believe that over 80% of our information is reliable") and admissions ("here's a list of times in the past we know we got things wrong") so that other parties can then measure those claims to see if they hold up, but presenting those as established facts rather than claims seems like the exact thing people who doubt the reliability would complain about.


Here’s an actual reliable source on the reliability of Wikipedia that confirms the meta Wikipedia article: https://amp.smh.com.au/national/evidence-suggests-wikipedia-... https://sci-hub.ee/https:/asistdl.onlinelibrary.wiley.com/do...

Wikipedia may be reliable, but you should never cite anything on its own reliability lmao


IME 99% somebody is feigning concern about Wikipedia's "reliability" it's because they want to use sources that are far more suspicious and unreliable.


> "most" (>80%) of the information is accurate (per random sampling)

Disinformation isn't random though; there's not an equal chance that information is misleading on ever topic.

Most information can be accurate while still containing dangerous amounts of disinformation.


Mostly agree, but:

> So the answer to "why not start now" is "because it seems like it will be a waste of time".

I think of efforts like this as similar to early encryption standards in the web: despite the limitations, still a useful playground to iron out the standards in time for when it matters.

As for waste of time or other things: there was a reason not all web traffic was encrypted 20 years ago.


Agree with most of your points, but a LargeLM, or a SmallLM for that matter, to construct a simple SQL query and put it in a database, they get it right many times already. GPT gets it right most of the time.

Then as a verification step, you ask one more model, not the same one, "what information got inserted the last hour in the database?" Chances of one model to hallucinate and say it put the information in the database, and the other model to hallucinate again with the correct information, are pretty slim.

[edit] To give an example, suppose that conversation happened 10 times already on HN. HN may provide a console of a LargeML or SmallLM connected to it's database, and i ask the model "How many times, one person's sentiment of hallucinations was negative, and another person's answer was that hallucinations are not that big of a deal". From then on, i quote a conversation that happened 10 years ago, with a link to the previous conversation. That would enable more efficient communication.


There is a difference between bugs and attacks. I think we are trying to solve attacks here. In an attack, I might build an LLM targeting some service that uses LLMs to execute real world commands. Adding providence to LLMs seems like a reasonable layer of security.

Now we shouldn’t be letting a random blob of binary run commands though right? Well that is exactly what you are doing when you install say Chrome.


A service should not use LLMs to execute real world commands. Ever.


I go back far enough in time and people said the same about Javascript in the browser, yet here we are, and will also be with LLMs.


Undoability is going to be a consideration. We let people use credit cards with practically no security for convenience, because the cost of reversing a few transactions of refunding people for fraud is low enough.


Many sources of information contain inaccuracies, either known at the time of publication or learned afterward.

Education involves doing some fact checking and critical thinking. Regardless of the strength of the original source.

It seems like using LLMs in any serious way will require a variety of techniques to mitigate their new, unique reasons for being unreliable.

Perhaps a “chain of model provenance” becomes an important one of these.


If you already know that your model contains falsehoods, what is gained by having a chain of provenance? It can't possibly make you trust it more.


People contain a shitload of falsehoods, including you, yet you assign varying amounts of trust to those individuals.

A chain of providence isn't much different then that person having a diploma, a company work badge, and state issued ID. You at least know they aren't some random off the street.


That only provides value if you have knowledge of how the diploma-issuing organisation is working.

If not, it's just a diploma from some random organisation off the street.

So, wake me up when you know how OpenAI and Google are cooking their models.


actually, are we sure they will improve, if there is emergent unpredicted behaviour in the SOTA models we see now, then how can we predict if what emerges from larger models will actually be better, it might have more detailed hallucinations, maybe it will develop its own version of cognitive biases or inattentional blindness...


How do we know the sun will rise tomorrow?


Because it has been the case for billions of years, and we adapted our assumptions as such. We have no strong reason to believe that we will figure out ways to indefinitely improve these chat bots. It may, but it may also not, at that point you are just fantasizing.


We’ve seen models improve for years now too. How many iterations are required for one to inductively reason about the future?


How many days does it take before the turkey realizes it’s going to get its head cut off on its first thanksgiving?

Less glibly I think models will follow the same sigmoid as everything else we’ve developed and at some point it’ll start to taper off and the amount of effort required to achieve better results becomes exponential.

I look at these models as a lossy compression logarithm with elegant query and reconstruction. Think JPEG quality slider. The first 75% of the slider the quality is okay and the size barely changes, but small deltas yield big wins. And like an ML hallucination the JPEG decompressor doesn’t know what parts of the image it filled in vs got exactly right.

But to get from 80% to 100% you basically need all the data from the input. There’s going to be a Shannon’s law type thing that quantifies this relationship in ML by someone who (not me) knows what they’re talking about. Maybe they already have?

These models will get better yes but only when they have access to google and bing’s full actual web indices.


I don't think access to bing or google solves this problem. Right now, there are many questions that the internet gives unclear answers to.

Try to find out if a plant is toxic for cats via google. Many times the results say both yes and no and it's impossible to assume which one is true based on the count of the results.

Feeding the models more garbage data will not make the results any better.


quite, look at Google offering a prize pot for 'forgetting', also, sorry, but typical engineer think that this comes after the creation, like forever plastics or petroleum, for some reason, great engineers often seem to struggle with second and third order consequences or believe externalities to be someone else's problem. Perhaps if they had started with how to forget, they could have built the models from the ground up with this capability, not tacked on after once they realise the volume of bias and wrongness their models have ingested...


We watched Moore's law hold fast for 50 years before it started to hit a logarithmic ceiling. Assuming a long-term outcome in either direction based purely on historical trends is nothing more than a shot in the dark.


Then our understanding of the sun is just as much a shot in the dark (for it too will fizzle out and die some day). Moore’s law was accurate for 50 years. The fact that it’s tapered off doesn't invalidate the observations in their time, it just means things have changed and the curve is different that originally imagined.


While my best guess is that the AI will improve, a common example against induction is a turkey's experience of being fed by a farmer, every day, right up until Thanksgiving.


As a general guideline, I tend to believe that anything that has lived X years will likely still continue to exist for X more years.

It is obviously very approximative and will be wrong at some point, but there isn't much more to rely on.


> I tend to believe that anything that has lived X years will likely still continue to exist for X more years.

I, for one, salute my 160-years-old grandma.


With humans, there is a lot of information available on how long a normal lifespan is. After all, people die all the time.

But when you try to predict a one-off event, you need to use whatever information is available.

One very valid application of the principle above is to never make plans with your significant other that are further off in the future than the duration of the relationship. So if you have been together for two months, don't book your summer vacation with them in December.


May she goes to 320


420


Because we understand newtons laws of motion and the sun and earth seem to, with extreme certainty, follow such laws, and by evaluating those laws forward into the future we see the orbits will continue such that the sun rises. It's not magic.

What rules and predictions can reliably describe how much machine learning will advance over time?


Poor comparison


No so! Either both the comments are meaningful, or both are meaningless.


Absolutely not. One thing happens because of a set of physical laws that govern the universe. These laws were discovered due to a massive number of observations of multiple phenomena by a huge number of individuals over (literally) thousands of years, leading to a standard model that is broadly comprehensive and extremely robust in its predictions of millions or possibly even billions of seperate events daily.

The other thing we have a small number of observations of happening over the last 50 or 60 years but mostly the last 5 years or so. We know some of the mathematical features of the phenomena we are observing but not all and there is a great deal going on that we don't understand (emergence in particular). The things we are seeing contradict most of the academic field of linquistics so we don't have a theoretical basis for them either outside of the maths. The maths (linear algebra) we understand well, but we don't really understand why this particular formulation works so well on language related problems.

Probably the models will improve but we can't naively assume this will just continue. One very strong result we have seen time and time again is that there seems to be an exponential relationship between computation and trainingset size required and capability. So for every delta x increase we want in capability, we seem to pay (at least) x^n (n>1) in computation and training required. That says at some point increases in capability become infeasible unless much better architectures are discovered. It's not clear where that inflection point is.


Well, based on observations we know that the sun doesn't rise or set; the earth turns, and gravity and our position on the surface create the impression that the sun moves.

There are two things that might change- the sun stops shining, or the earth stops moving. Of the known possible ways for either of those things to happen, we can fairly conclusively say neither will be an issue in our lifetimes.

An asteroid coming out of the darkness of space and blowing a hole in the surface of the earth, kicking up such a dust cloud that we don't see the sun for years is a far more likely, if still statically improbable, scenario.

LLMs, by design, create combinations of characters that are disconnected from the concept of True, False, Right or Wrong.


Is the function of human intelligence connected to true false right or wrong? These things are 'programmed' into you after you are born and from systematic steps.


Yes, actually. People may disagree on how to categorize things, but we are innately wired to develop these concepts. Erikson and Piaget are two examples of theorists in the field of child psychology who developed formalizations for emotional and mental stages of development. Understanding that a thing "is" is central to these developmental stages.

A more classic example is Freud's deliniation between the id, ego and super-ego. Only the last is built upon imparted cultural mores; the id and ego are purely internal things. Disorders within the ego (excessive defense mechanisms) inhibit perception of what is true and false.

Chatbots / llms don't consider any of these things; they consider only what is the most likely response to a given input?. The result may, by coincidence, happen to be true.


I don't understand why that is necessarily true.


Because they are both statements about the future. Either humans can inductively reason about future events in a meaningful way, or they can’t. So both statements are equally meaningful in a logical sense. (Hume)

Models have been improving. By induction they’ll continue until we see them stop. There is no prevailing understanding of models that lets us predict a parameter and/or training set size after which they’ll plateau. So arguing “how do we know they’ll get better” is the same as arguing “how do we know the sun will rise tomorrow”… We don’t, technically, but experience shows it’s the likely outcome.


It's comparing the outcome that a thing that has never happened before will (no specified time frame), versus the outcome that a thing that has happened billions of times will suddenly not happen (tomorrow). The interesting thing is, we know for sure the sun will eventually die. We do not know at all that LLMs will ever stop hallucinating to a meaningful degree. It could very well be that the paradigm of LLMs just isn't enough.


What? LLMs have been improving for years and years as we’ve been researching and iterating on them. “Obviously they’ll improve” does not require “solving the hallucination problem”. Humans hallucinate too, and we’re deemed good enough.


Humans hallucinate far less readily than any LLM. And "years and years" of improvement have made no change whatsoever to their hallucinatory habits. Inductively, I see no reason to believe why years and years of further improvements would make a dent in LLM hallucination, either.


As my boss used to say, "well, now you're being logical."

The LLM true believers have decided that (a) hallucinations will eventually go away as these models improve, it's just a matter of time; and (b) people who complain about hallucinations are setting the bar too high and ignoring the fact that humans themselves hallucinate too, so their complaints are not to be taken seriously.

In other words, logic is not going to win this argument. I don't know what will.


I don’t know if it’s my fault or what but my “LLMs will obviously improve” comment is specifically not “llms will stop hallucinating”. I hate the AI fad (or maybe more annoyed with it) but I’ve seen enough to know these things are powerful and going to get better with all the money people are throwing at them. I mean you’d have to be willfully ignoring reality recently to not have been exposed to this stuff.

What I think is actually happening is that some people innately have taken the stance that it’s impossible for an AI model to be useful if it ever hallucinates, and they probably always will hallucinate to some degree or under some conditions, ergo they will never be useful. End of story.

I agree it’s stupid to try and inductively reason that AI models will stop hallucinating, but that was never actually my argument.


> Humans hallucinate far less readily than any LLM.

This is because “hallucinate” means very different things in the human and LLM context. Humans have false/inaccurate memories all the time, and those are closer to what LLM “hallucination” represents than humam hallucinations are.


Not really, because LLMs aren't human brains. Neural nets are nothing like neurons. LLMs are text predictors. They predict the next most likely token. Any true fact that happens to fall out of them is sheer coincidence.


This for me is the gist, if we are always going to be playing pachinko when we hit go then where would a 'fact' emerge from anyway, LLM don't store facts, correct me if I am wrong, as my topology knowledge is somewhat rudimentary, so here goes, first, my take, after this, I'll past GPT4's attempt to pull this into something with more clarity!

We are interacting with multidimensional topological manifolds, and the context we create has a topology within this manifold that constrains the range of output to the fuzzy multidimensional boundary of a geodesic that is the shortest route between our topology and the LLM.

I think some visualisation tools are badly needed, viewing what is happening is for me a very promising avenue to explore with regards to emergent behaviour.

GPT4 says; When interacting with a large language model (LLM) like GPT-4, we engage in a complex and multidimensional process. The context we establish – through our inputs and the responses of the LLM – forms a structured space of possibilities within the broader realm of all possible interactions.

The current context shapes the potential responses of the model, narrowing down the vast range of possible outputs. This boundary of plausible responses could be seen as a high-dimensional 'fuzzy frontier'. The model attempts to navigate this frontier to provide relevant and coherent responses, somewhat akin to finding an optimal path – a geodesic – within the constraints of the existing conversation.

In essence, every interaction with the LLM is a journey through this high-dimensional conversational space. The challenge for the model is to generate responses that maintain coherence and relevancy, effectively bridging the gap between the user's inputs and the vast knowledge that the LLM has been trained on."


If you believe humans hallucinate far less then you have a lot more to learn about humans.

There are a few recent Nova specials from PBS that are on YouTube that show just how much bullshit we imagine and make up at any given time. It's mostly our much older and simpler systems below intelligence that keep us grounded in reality.


It's like you said, "...our much older and simpler systems... keep us grounded in reality."

Memory is far from infallible but human brains do contain knowledge and are capable of introspection. There can be false confidence, sure, but there can also be uncertainty, and that's vital. LLMs just predict the next token. There's not even the concept of knowledge beyond the prompt, just probabilities that happen to fall mostly the right way most of the time.


We don't know that the mechanism used to predict the next token would not be described by the model as "introspection" if the model was "embodied" (otherwise given persistent context and memory) like a human. We don't really know that LLMs operate any differently than essentially an ego-less human brain... and any claims that they work differently than the human brain would need to be supported with an explanation of how the human brain does work, which we don't understand enough to say "it's definitely not like an LLM".


I'm trying to interpret what you said in a strong, faithful interpretation. To that end, when you say "surely it will improve", I assume what you mean is, it will improve with regards to being trustworthy enough to use in contexts where hallucination is considered to be a deal-breaker. What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.

On the other hand, the problem of getting people to trust AI in sensitive contexts where there could be a lot at stake is non-trivial, and I believe people will definitely demand better-than-human ability in many cases, so pointing out that humans hallucinate is not a great answer. This isn't entirely irrational either: LLMs do things that humans don't, and humans do things that LLMs don't, so it's pretty tricky to actually convince people that it's not just smoke and mirrors, that it can be trusted in tricky situations, etc. which is made harder by the fact that LLMs have trouble with logical reasoning[1] and seem to generally make shit up when there's no or low data rather than answering that it does not know. GPT-4 accomplishes impressive results with unfathomable amounts of training resources on some of the most cutting edge research, weaving together multiple models, and it is still not quite there.

If you want to know my personal opinion, I think it will probably get there. But I think in no way do we live in a world where it is a guaranteed certainty that language-oriented AI models are the answer to a lot of hard problems, or that it will get here really soon just because the research and progress has been crazy for a few years. Who knows where things will end up in the future. Laugh if you will, but there's plenty of time for another AI winter before these models advance to a point where they are considered reliable and safe for many tasks.

[1]: https://arxiv.org/abs/2205.11502


> What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.

I mean this is what I was saying. I just don't think that the technology has to become hallucination-free to be useful. So my bad if I didn't catch the implicit assumption that "any hallucination is a dealbreaker so why even care about security" angle of the post I initially responded to.

My take is simply just that "these things are going to be used more and more as they improve so we better start worrying about supply chain and provenance sooner than later". I strongly doubt hallucination is going to stop them from being used despite the skeptics, and I suspect hallucination is a problem of lack of context moreso than innate shortcomings, but I'm no expert on that front.

And I'm someone who's been asked to try and add AI to a product and had the effort ultimately fail because the model hallucinated at the wrong times... so I well understand the dynamics.


Just because you can inductively reason about one thing doesn't mean you can inductively reason about all things.

In particular you absolutely can't just continue to extrapolate short-term phenomena out blindly into the future and pretend that has the same level of meaning as things like the sun rising which are the result of fundamental mechanisms that have been observed, explored and understood iteratively better and better over an extremely long time.


Originally: very few input toggles with little room for variation and with consistent results.

These days: Modern technology allows us to monitor the location of the sun 24/7.


one day it won't...


> Obviously the models will improve

Says who? The Hot Hand Fallacy Division?


Not sure what point you're trying to make here, since I don't know if you're referring to

(a) the initial, intuitive belief that basketball players who had made several shots in a row were more likely to make the next one (b) the analytical analysis that disproved a, which no doubt stemmed from the belief that every shot must be totally independent of its context, disregarding the human factors at play (c) the revised analysis that found that the analysis in b was flawed, and there actually was such a thing as a "hot hand."


I'm talking about the fallacy, you know the reason I included the word "fallacy" in the sentence.

You know we're not talking about sports, right?

HN is wild.


If you assume no one knows the context of your reference, why would you use it? Regardless, I included the details because they're interesting and one sometimes learns interesting things on HN.

Anyway, the lesson of the hot hand fallacy is that sometimes intuitive predictions turn out to be right, despite the best efforts of low-context contrarians. But I don't think that was your point.


> If you assume no one knows the context of your reference, why would you use it?

You are the only one who is confused.


The trend. Obviously nobody can predict the future either. But models have been improving steadily for the last 5 years. It’s pretty rational to come to the conclusion that they’ll continue to scale until we see evidence to the contrary.


"the trend [says that it will improve]" followed by "nobody can predict the future either" is just gold.

> It’s pretty rational

No, that's why it's a fallacy.


Are you referring to slippery slope? That doesn't apply here since there's no small step that is causing them to believe the models will continue to get better.

What about Moore's law? Observing trends and predicting what might happen isn't a particularly new idea. You're not the only one, but I find it odd when people toss around the fallacy argument when a trend isn't pointing their way in an argument. I'm sure you use past trends to inform many of your thoughts each day.


I literally detailed the fallacy in the original comment, it would be great if you could read.


Your points are short and without substance, so it's hard to follow along as other sibling comments seem to also indicate.

Anyway, the point stands. The fallacy is believing with certainty that something will happen because of past events. That doesn't mean prediction is futile. Might want to re-read your wikipedia pages to better understand!


You’re misunderstanding me. It’s also a fallacy to believe the sun will rise tomorrow. Everything is a fallacy if you can’t inductively reason. That’s the point, we agree.


Nonsense. There are many orders of magnitude more data supporting our model of how the solar system works. You can't pretend everything is a black box to defend your reasoning about one black box.


I’m not pretending anything is a black box. The sun is going to run out of fuel. We have no idea when that will happen. There is a philosophical treatment for what I’m arguing, we just like to ignore it and let our egos convince us we’re a lot more certain about the world than we are.


>We have no idea when that will happen

Why do you think this? We know how the sun works, how much nuclear fuel it has, and what life stages a star goes through as it uses up fuel, and how that life cycle changes based on size. We know the sun will stop shining, depending on your definition of that, in about 10 billion years. We know these things from studying THOUSANDS of other suns in various parts of their life cycle. We can make predictions on stars we observe, and watch them come true, which is the only valid judgement of a theory or model.

You not knowing something (like statistics) doesn't mean nobody knows it.


Have we experimentally recreated a sun and verified any of the theoretical models we have?

We have well understood theories about how we think the sun works based on observations of other suns, yes. But that's all.


Ironically, the thing we actually created (LLMs) is much more poorly understood than the one we haven't (solar system). We do have centuries of data and a great assortment of really well understood models of how the solar system works, and we understand the math really well. There are no mysteries in orbital mechanics and we can foresee sunrises for the next few billion years.

You're muddying the waters willingly. This is intellectually dishonest.


No I'm not.

Categorically it's the same problem. I just don't give any more credence to "centuries of data on orbital mechanics" for the purpose of this discussion about the the epistemological understanding of whether the sun will continue to exist or not at some specified point in time in the future.

Is it more likely based on track record/history that we'll still have a sun in 50 years than improved LLMs? Uh likely yes. I never argued one was more or less likely than the other. I only argued that the same logical reasoning/argument is used to come to the conclusion that we'll have a sun in the future as it is to deduce that LLMs will probably improve.

So unless you call epistemology dishonest, I'm not being dishonest. I'm pointing out something that people commonly glaze over in their practical day to day lives. I pointed it out because someone challenged my argument that LLMs will improve by saying essentially "well we don't know that". Of fucking course we don't. But we don't know that in the same way we don't know that the sun will rise tomorrow. That's all I'm saying. You're just missing the nuance and I don't know why you're resorting to calling it intellectually dishonest.


> Have we experimentally recreated a sun and verified any of the theoretical models we have?

Yes it was called the Cold War.

Little tiny suns, but all those H-bombs (and reactors like the NIF and Z-pinch) verified quite a lot of the fundamentally identical physics.


Fusing two atoms is not "a sun", sorry. It's the reaction that happens in the sun, sure, but it doesn't tell us how anything at a macro level about the sun (or gravity for that matter). That's all observation and theory until we can fly into one or recreate one that exhibits the same macro-level behavior.

For all we know there's something important we haven't observed about the sun's ability to consume its available fuel (whatever that mass is) and what happens to the exhaust products that could cause the sun to cool far sooner than we think. Who knows /shrug... not that I don't hope we've got it right in our understanding.


It's a lot more than two atoms, and all the various experiments leading up to them being weaponisable gave us all the info we need[0]. If we didn't know how the sun worked, the bombs wouldn't bang.

This by itself should be enough to pass the test of:

>> Have we experimentally recreated a sun and verified any of the theoretical models we have?

in the affirmative.

I mean, it's not like science requires 1:1 scale models.

> (or gravity for that matter)

Neither cheese, which is a similar non-sequitur.

[0] Including the fun fact that the sun is a "cold" fusion reactor, in the sense that it's primarily driven by quantum mechanical rather than high-energy ("thermo-nuclear") effects.

I'm not sure if this was first noted before or after the muon-catalysed fusion research.

Physics: the only place where someone looks at ten million K and goes "huh, that's cold".


> It’s also a fallacy to believe the sun will rise tomorrow.

No brother, it's science, and frankly that you believe this is not surprising to me at all.


You should study some philosophy of science. This stuff isn’t made up. Either you believe inductive reasoning works or you don’t. Philosophically it’s no more likely that the sun will rise tomorrow than it is that the trend of LLMs improving with parameter size continues. We are just prideful humans and tend to believe we are really sure about things.


Luckily, science doesn't give a fuck what philosophy thinks. Our models of the solar system and space in general are very thorough, well tested, and we even know the circumstances in which they break down. We can reliably make predictions about the future using these models, and with high confidence. These predictions, time and again, come correct. Newtons laws have only held for the entire time we've known them, including in locations that are billions of miles away and completely divorced in the time dimension from our own.

Philosophy is great and all, but Newton gives you raw numbers that are then verified by reality. I'm going to rely on that instead of untestable breathless "but ACTUALLY" from people who provide no actionable insight into the universe.


I think you have it backwards. The ordering goes:

Philosophy -> Math -> Physics -> Chemistry -> etc.

Everything to the right depends on, or is an application of, the discipline to the left. "Science" starts at physics.


> that they’ll continue to scale until we see evidence to the contrary

Just because there is no proof for the opposite yet doesn't mean the original hypothesis is true.


Exactly. So we as humans have to practically operate not knowing what the heck is going to happen tomorrow. Thus we make judgement calls based on inductive reasoning. This isn’t news.


While I agree with them, I've found a lot of the other responses to not be conducive to you actually understanding where you misunderstood the situation.

AI performance often decreases at a logarithmic rate. Simply put, it likely will hit a ceiling, and very hard. To give a frame of reference, think of all the places that AI/ML already facilitate elements of your life (autocompletes, facial recognition, etc). Eventually, those hit a plateau that render it unenthusing. LLMs are destined for the same. Some will disagree, because its novelty is so enthralling, but at the end of the day, LLMs learned to engage with language in a rather superficial way when compared to how we do. As such, it will never capture the magic of denotation. Its ceiling is coming, and quickly, though I expect a few more emergent properties to appear before that point.


No, a signature will not guarantee anything about if the model is trained with correct data or with fake data. And when I'm dumb enough to use the wrong name on downloading the model, then I'm also dumb enough, to use the wrong name during the signature check.


Citation on "will"


> Obviously the models will improve

I mean, to some extent, but isn't reasonable to assume hallucination is a hard problem?

Hallucination shows there's plenty of things they didn't actually learn, and are just good at seeming they learned.

Like, if it gets exponentially harder to train them it's possible the level of hallucination will improve far worse than linearly even.


> "Obviously the models will improve."

Found the venture capitalist!


I think people are conflating “get better” with “never hallucinate” (and I guess in your mind “make money”). They’re gonna get better. Will they ever be perfect or even commercially viable? Who knows.


"You're holding it wrong."

A language model isn't a fact database. You need to give the facts to the AI (either as a tool or as part of the prompt) and instruct it to form the answer only from there.

That 'never' goes wrong in my experience, but as another layer you could add explicit fact checking. Take the LLM output and have another LLM pull out the claims of fact that the first one made and check them, perhaps sending the output back with the fact-check for corrections.

For those saying "the models will improve", no. They will not. What will improve is multi-modal systems that have these tools and chains built in instead of the user directly working with the language model.


I agree, their needs to be human oversight, I find them interesting, but not sure beyond creative tasks, what I would actually use it for, I have no interest in replacing humans, why would I, so, augmenting human creativity with pictures, stories, music, yes, that works, it does it well. Education, law, medical, being in charge of anything, not so much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: