'Facts' aren't as black and white as people think. "What does Charmander evolve ...

mdp2021 · on July 24, 2023

> These are all 'factual questions'

Because of elision.

"[Homer wrote] that Achilles died of an arrow in the heel"

This is why the Wiener Kreis taught to use protocolar statements: "<There> and <at that time> <that individual> witnessed <that fact>".

austinjp · on July 24, 2023

Tangential: I was going to suggest "protocoli(s|z)ed" instead of protocollar, but I Googled "protocollar statements" just to check and found 2 things. First, this page was the top result! Second, "protocolar" (one ell) and "protocolary" are apparently real words. New to me, thanks.

mdp2021 · on July 24, 2023

You had me check a few sources for found expressions in use for the concept: you can find simply "protocols" (intending that), "protocol statements", the "protocol-sentence debate", "protocollar propositions"...

Edit: oh, by the way, in case of interest: https://plato.stanford.edu/entries/vienna-circle/

cubefox · on July 24, 2023

In German they were called "Protokollsätze", which translates to "protocol sentences".

mdp2021 · on July 24, 2023

(By the way:

> "protocoli(s|z)ed"

the use of '-ize' is (a graecism) indicated by the OED as International English, as opposed to British, American etc. In fact, some call International English "British spelling with -ize" - it is not exactly that but close. One exception is 'analyse', but that is because linguists compromised on the "difficult" original 'analysize'.)

stavros · on July 24, 2023

What's "analysize"? That's not a Greek word.

mdp2021 · on July 24, 2023

It's so determined by Fowler; pls. check this: https://www.etymonline.com/search?q=analyse

stavros · on July 24, 2023

Ah, it would have been the correct way to transfer it to English, it says, not that it's in any way original.

mdp2021 · on July 24, 2023

Sorry, my imprecision. You spend ages trying to find proper expression, and yet... Well, this proves the importance of the effort.

stavros · on July 24, 2023

Haha, that, it does.

bryanrasmussen · on July 24, 2023

I think protocollar is in this context a misspelling of protocolar - hence its high placement for protocollar statements, if I google protocolar statements this is the highest result (for me)

https://www.britannica.com/topic/protocol-sentence

mdp2021 · on July 24, 2023

> a misspelling of protocolar

It could be. I cannot bring to mind the rules for doubling right now. They both occur, 'protocolar' much more often. I will correct my original post.

BugsJustFindMe · on July 24, 2023

> When an LLM is suggesting what might come next in a piece of text... it doesn't know if it's supposed to guess a probable word from a Wikipedia article, an Onion article, a Project Gutenberg manuscript, or an Archive Of Our Own fanfic.

The obvious start seems to be having separate fiction and nonfiction LLMs and not training the nonfiction ones on Archive Of Our Own. People also end up confused about the truth when nobody points out the difference between fiction and nonfiction.

somenameforme · on July 24, 2023

But there's a fundamental issue here. The real strength of LLMs is not just information retrieval, but being able to dynamically recombine that information. Of course that's also their weakness. The reason GPT will regularly produce code with nonexistent API calls is not because it's been trained on 'fictional APIs', but because it's combining various real calls to make new fictional ones.

The obvious answer then is to tell it to make sure that what it's finally outputting is really part of the "real" API, but I think it's safe to say there's some technical hitch there, as it's safe to say OpenAI probably spent quite a lot of energy trying to solve the code hallucinations, and ultimately was unable to do so. I'd guess that the more you restrict its recombination ability, the more you end up with it inappropriately (and incorrectly) just regurgitating large chunks of its training input verbatim. Basically it becomes more like a keyword hunting search engine, and less like a generative LLM.

kfrzcode · on July 24, 2023

Yes, and information is much, much different than knowledge.

xyzzy123 · on July 24, 2023

I kinda like this but e.g are research papers fact or fiction?

How about an economics textbook, or an article in the economist? "A history of the english speaking peoples" by Winston Churchill?

If we restrict to "ground truth we feel very sure about" it feels like available training data might be quite small.

immibis · on July 24, 2023

and what if the economics textbook contains "much like Charmander evolves into Charizard, free markets evolve into monopolies"?

jameshart · on July 24, 2023

Right. A lot of the magic of LLMs probably comes from the broader appreciation of language and cultural reference that they get from being trained on a diverse corpus, rather than just a bunch of dictionaries and reference books.

And anyway - answers to all my ‘fictional facts’ questions above can be sourced from Wikipedia - there’s tons of made up stuff on there.

BugsJustFindMe · on July 25, 2023

Hopefully such statements are sufficiently rare that they don't get reinforced, I guess. I don't know. A very real problem occurs with people too when fictional things are repeated often enough without direct mention of their fictional nature.

prometheus76 · on July 24, 2023

Which of these is more true: a newspaper article about a battle in the War of 1812, or the Star-Spangled Banner, which was written by someone witnessing a battle in the War of 1812.

Hint: how many stadiums are filled with people standing up to recite a newspaper article about a battle in the War of 1812?

astrange · on July 24, 2023

Rather than leaving some text out, it'd be better to label it with its source when training.

sweezyjeezy · on July 24, 2023

> it doesn't know if it's supposed to guess a probable word from a Wikipedia article, an Onion article, a Project Gutenberg manuscript, or an Archive Of Our Own fanfic. So you get a bit of all that.

This is true of base LLM models that are just trained on missing-word prediction on the training corpus, but one of the main points of RLHF[1] is to tune this model to make these kind of inferences the way a human would expect. For example if you asked an untuned model to write a poem in the style of ... etc., a valid internet response might be "hmm no thanks, you go first", you need to steer the model away from replying like this.

I'm not saying it's perfect, but it's wrong to say e.g. GPT-4 has had no information about the difference between a good and bad response and is just generating internet-like text at random, the big players have made progress on this already.

[1] https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

jameshart · on July 24, 2023

Right.

Reinforcement learning trains them that question and answer sessions contain answers which statistically correlate with factual statements in their broader learning corpus.

When formulating answers, this leads them to formulate answers that reflect the factual information on which they were trained.

My point is that the source data contains a far muddier range of information than just unarguable facts.

We largely want LLM based Q&A bots to answer questions about fictional or mythical characters in their own terms. As I said, those questions above all have reasonably ‘correct’ answers.

The fact that from all that LLMs do as well as they do is remarkable. But it also seems like it requires us to assume that LLMs are capable of a remarkable degree of cultural nuance, media literacy and contextual awareness for them to figure out the different authorship, salience, trustworthiness, agenda, biases, and assumptions of all the gigareams of text they’ve ingested.

circuit10 · on July 24, 2023

“ When an LLM is suggesting what might come next in a piece of text... it doesn't know if it's supposed to guess a probable word from a Wikipedia article, an Onion article, a Project Gutenberg manuscript, or an Archive Of Our Own fanfic”

LLMs are very good at inferring context, so that only really applies if you’re using an un-RLHFed base model with no context given

mjburgess · on July 24, 2023

Here, "supposed to guess" means "having the goal of..."

So no LLM knows what it's supposed to do. If you prefer, you could say it only ever has one goal: to generate a sequence of tokens which are jointly the most probable to occur along with the prompt tokens, given such probabilities in a historical corpus.

This imitates knowledge, goal-directness, "inferring context" etc. without doing any of those things. Consider what the aim of knowing, goal-directness, inferring , etc. is --- it is never "consistency with a historical text corpus".

For knowing: that beliefs correspond to the way the world is; for goal-directness that one's acts+desires can realise changes; for 'inferring context': that one is sensitive to reasons to speak outside of what is literally spoken.

LLMs are never sensitive to reasons to speak outside of what has been spoken.

Ajedi32 · on July 24, 2023

What does RLHF do then? I feel like you completely ignored the central point of GP's comment.

RLHF is the difference between GPT-3.5 and ChatGPT, and it's the whole reason why LLMs are suddenly such a big deal. ChatGPT demonstrated that it's possible to give language models a goal beyond just "complete most likely next word" and that they can actually be somewhat competent at achieving those goals despite not being explicitly trained for them.

mjburgess · on July 24, 2023

> competent at achieving those goals despite not being explicitly trained for them.

Well (1) it doesn't achieve goals, since a "goal" is observer-relative. We have goals, the LLM has a formal optimisation objective which gives it the appearence of goal-directed behaviour (in a similar way, eg., that it appears pens want to fall when dropped).

And (2), reading your "goal" here even in observer-relative ways, I don't think there's much evidence of this. These models are "trained" on everything ever written, include all of the internet and basically all digitised book. I don't see any evidence of much generalisation -- if you can find it by google, then the LLM has it stored compressed (ie., the "weights").

The innovation in LLMs is being able to compute `max P(answer|prompt, historical_corpus)` for increasingly longer prompts --- there's no innovation in goal-directed behaviour.

That's VC propangada to disguise the fact that LLMs are mostly an innovation in copyright laundering.

Ajedi32 · on July 24, 2023

(1) This is a tired, pointless semantic argument. "It doesn't have a goal, it just acts like it has a goal for all intents and purposes. But, you see, it's actually a machine and not a human and therefore it can't really have goals according to my narrow definition of the term." Either point to an actually relevant difference in the resulting behavior or stop objecting when people use human behavioral terms to describe the behavior of machine learning systems. We're all well aware it's a program; that's not the point. (Sorry, just a frustration I have with the larger discussion around this topic.)

(2) "I don't see any evidence of much generalisation" Seriously? So when I tell ChatGPT to rewrite a paragraph in the style of Shakespeare and it does it, despite never being trained to do that, never seeing the source or target paragraph before, and having no information other than my text prompt and its past training, that's not evidence of generalization? And that's only one of millions of different possible tasks that the same model excels at, despite being trained on nothing but a bunch of unstructured text and a few examples indicating its goal should be to follow instructions given in the prompt text. Up until a couple years ago this level of flexibility in a machine learning model would have been considered science fiction by nearly everyone, and now it's "[not] evidence of much generalization". Okay.

mjburgess · on July 24, 2023

Well (1), the reason this distinction is relevant is so we can separate out whether the system has developed a capacity or an apparent capacity.

Is the child a genius or are they just reading out of a textbook? Can the toddler really compose a sonata or did they just press play on the piano keyboard?

(2) This is indeed the power of interpolating between the data points of "everything ever written in human history" as digitised and compressed by ChatGPT.

If you have 1 billion circles of radii 0 to 1, it isn't generalisation for the machine to produce one with a radii 0.0000100003000001, ie., one not in the set but a mere interpolation of points within it.

It would be expensive, but imagining "reversing" ChatGPT from it's output to the sources which made a non-trivial difference to generating that output.

So the function there is: response -> verbatim text in the training corpus.

Then, maybe, "bolded" by how much each paragraph would "make a difference" to its output.

What you'd find is thousands of pages: all Shakespeare ever written, all papers about Shakespeare, all books about Shakespeare; and so on.

Then when it applied the bolding, and summarising it a little, the trick would be revealed: it would be apparent how a naive statistical interpolation between sequences of characters could produce the effect.

ChatGPT exists because of ebooks and social media: without it, it could do almost nothing. That is, the appearance of these capacities is strictly derivative of the work of a billion people who had them.

Without vast, unimaginable, amounts of work produced on Shakespeare this system wouldnt work. It's just a copyright laundering system. All the school essays on reddit, all the forum posts; all of usenet. All pdfs, all digitised works. All academic papers.

Is this generalisation? Is this a system which starts with little and makes a lot?

Or is it a system which is more like a child reading from a textbook? Ie., making a haphazard ability to repeat what's already written.

The size of the weights of a modern LLM are sufficient to compress everything ever written in human history: and that's exactly what they do.

Ajedi32 · on July 24, 2023

It isn't apparent that anything you've just described is relevant. You've described how it works (in a highly simplified way), but that doesn't discredit the end result.

If there's truly a difference between "a capacity [and] an apparent capacity" then you should be able to point out what that difference actually is in practice. A child pressing play on a piano can only play one song. A LLM composing poems can compose billions upon billions of unique, never-before-seen poems about every conceivable topic. Whether under the hood it does that by "interpolating numbers in n-dimensional spaces" or "some incomprehensible arrangement of neurons linked together" or some other, yet to be invented process doesn't matter if the result is the same. The fact that you can explain how something works doesn't make it less real.

rcxdude · on July 24, 2023

This is something which GPT generally isn't confused about though: it knows the answer to these questions and it knows that these are questions and statements about well-known works of fiction. I don't really think this is the source of the tendency for LLMs to make stuff up.

taneq · on July 24, 2023

Mellon. The rest are left as an exercise to the reader.

It does always amaze me that we trained LLMs on a dump of the internet and then people are shocked that they're about as trustworthy as a random web page.

yreg · on July 24, 2023

People are not shocked and poor training data is not the main reason LLMs are not trustworthy.

kfrzcode · on July 24, 2023

The issue here is one of semiotics and morphemology. Mapping meaning into a narrative and ontological protocol is going to be the requisite work if we want the engine to be "smart." As explored in the discussion at hand, tokenization creates a great mimic but it's a parlor trick. We must employ a robust thinking-thing that correlates not only a static, contextually indexed dictionary <lexicography>, we must also route that through a network to distill meaning itself into tokens. Perhaps languages which rely on morphemes for written language - a logosyllabary - are somewhat more or less suited for this task? I ask as a dummy.

There also exists the consideration of allographemical contextualization, the nature of relevance, pragmatics, conjunct identification of context, semantics. To be honest the linguistics side alone is vast. Knowledge and cognition however. . . A whole other ballgame. But the only tool we have to really get down to the bottom of how knowledge works is language, it's to epistemological pursuit what math is to physics.

While GPT is super impressive and can do a lot of quasi-brute-force things, we're only finding now the rudiments of the machined intelligence paradigm, and it will behoove any reader to brush up on their classics, true pursuants of philosophy and many order logic are about to be in high demand if I had to reckon.

jojobas · on July 24, 2023

I prefer to think that most humans actually distinguish the fictional context, and so should LLM. As such, if it is to be of any use, it'd better figure out it's fiction if someone's flying on a winged horse, levitating trolls or (obviously harder) running around a forest with a bow.

And when answering a question, unambiguously specify this fictional context, or at least indicate that it might be fiction if unsure.

prometheus76 · on July 24, 2023

How do we handle historical fiction, or even more perilous, how do we handle stories that are "based on historical events"?

QuantumGood · on July 24, 2023

I'm not sure a higher level "intelligence" (which some folks think AI is moving towards) should be overly-reliant on human "intelligence", lest it inherit flaws which may outnumber benefits. (Humans believe a variety of outlandish things, such as "Q-Anon has the real facts", etc.)

falcor84 · on July 24, 2023

Absolutely agreed. And I'll just put this here:

    "You need to believe in things that aren't true. How else can they become?"
    - "Hogfather" by Terry Pratchett

zapataband1 · on July 24, 2023

source of truth: wikipedia-inference.db.2023

charmander -> pokemon -> fiction avada kedavra -> harry potter -> fiction sindarin -> ??? -> infer( fiction or nonfiction) Robin Hood -> disambiguation -> ask(user input-> do you mean?) ...

This just seems like a categorization and data annotation problem, which I would assume a bunch of projects are trying to solve like this one.

tyre · on July 24, 2023

> What does Charmander evolve into?

wait why is this implied to not be black and white? Charmeleon is the only correct answer.

yifanl · on July 24, 2023

Charmanders don't evolve into anything, it doesn't exist in the natural world.

CharlieDigital · on July 25, 2023

By this logic, "Is Moby Dick a sperm whale" also can't be answered factually because Moby Dick is a fictional creation and doesn't exist in the natural world?