*JPMorgan Chase CEO Jamie Dimon testified before a Senate Banking, Housing, and ...

vaskal08 · on May 11, 2023

Hey, other dev on this project. This is a good catch, and we're aware of this issue. What it's doing is actually using a photo caption as part of the article, and we're working on removing the use of that in the summarization process.

kristopolous · on May 12, 2023

Their are news APIs

Start with those and then figure out how to scrape a site as your input and spit out the existing API format and you'll come in through a clever side route, essentially having a two phase assembly line.

Also this will allow users to customize their "feed" as a free side effect of the architecture and furthermore you'll be able to isolate your scraping -> API transform on a per site basis, also as a free consequence and lastly, you can parallelize the work much cleaner and even have the public add their own "transformer" for their favorite news site

lxe · on May 11, 2023

Parsing pdfs or web semantically is really not an easy job, as I found in my own foray into LLM sumamrization.

startupsfail · on May 11, 2023

Maybe image search and if the image is not novel, ignore it?

cutemonster · on May 12, 2023

Good point (it seems to me), and if it's AI generated, (try to) ignore it too I guess

startupsfail · on May 12, 2023

Why? If it is an AI generated image, it was generated from a text prompt, by the author of the article. Author had reviewed the image. The image is novel.

As long as this is novel content, it should be parsed, I think.

cutemonster · on May 14, 2023

Maybe it depends. Let's say some thought have gone into writing the prompt, and the image (and image text?) then explains how something works or helps one understand the article better.

Or if the prompt to generate the image, doesn't include anything interesting that isn't in the article already. (F.ex. "generate a nature photo related to this article"?)

madeofpalk · on May 11, 2023

I've come around to there being legitimate usecases for this type of generative AI, but I don't think producing anything that's supposed to be "True" or "Correct" is one. I think the only useful usecases is for when you want to generate fiction.

ghotli · on May 11, 2023

If you tell GPT-4 specifically to respond with proper jargon for the domain like that found in a textbook or journal it provides much much more useful replies. Silly that prompt engineering is what's required but at least for my purposes wherein I fact check it's output it's right nearly all the time and I've learned a great deal.

__loam · on May 11, 2023

Even then there's literally nothing stopping it from making shit up.

ghotli · on May 12, 2023

Sure yeah, for now. Just saying I literally use it to mine out things to confirm (/ not believe until I do) and so far it has very rarely led me astray and even then it's been small nuance. It's striking.

lionkor · on May 12, 2023

What do you mean "for now"? Yes, right now, at the most impressive it's ever been, it has this fundamental flaw.

ghotli · on May 12, 2023

Assuming you're serious and not just knee jerk reacting to the flippant way I replied, this is pretty much exactly what I meant. The full text of this short story I can't find a good link (old ones I've read are broken links now) but this reading on YouTube will suffice.

Asimov, The Last Question https://youtu.be/ojEq-tTjcc0

fennecfoxy · on May 12, 2023

Which is hilarious considering humans make shit up, lie and parrot falsehoods _constantly_. Even without intending to.

wanttocomment · on May 12, 2023

It doesn't mean it's less biased. All of these styles are exploited as a form of rhetoric. Many people simply take information written in a textbook style as authoritative.

fennecfoxy · on May 12, 2023

And that's why I ignore the people that laugh at the whole prompt engineering thing, because it's a genuine skill.

At the moment GPTs are trained on so much data across so many domains that you have to treat it like a person who has similar knowledge.

If I just walk up to you and start sputtering jargon about a very specific complex topic, when you were just chatting to another friend about all sorts of everyday topics, you're not going to be able to reply to me immediately.

With these GPTs it helps to get it "in the mood" for your topic by preloading keywords and shifting the topic over and deeper so your desired topic is clearer to the attention mechanism.

golemotron · on May 12, 2023

I've come around to there being legitimate use cases for journalism, but I don't think producing anything that's supposed to be "True" or "Correct" is one.

jeremy_wiebe · on May 12, 2023

I’m to the point where I’d probably put more trust in an AI generated news summary than many of the sites that purport to give me accurate and truth worthy news.

stephenhuey · on May 12, 2023

For a lower-tech approach, The Flip Side is pretty good at doing one story each day from 2 different sides. I was a bit annoyed when an excited friend signed up my email address without asking me, but I have never unsubscribed because I find it refreshing in a typical world of frenzied news.

https://www.theflipside.io

scrubs · on May 12, 2023

Screwing it up isn't a crime. Papers can retract and re-visit a line of thinking. While nobody loves doing that, I think it'd help.

In addition I think it'd help if papers hammered the party line of Dems and Republicans far harder. My running joke / dare is: send sportscasters to DC for a year. At some point they'll call BS on everything and everyone, and start questioning with both barrels. BS is less tolerated in sports.

Take taxes. Trickle down is BS. But it's also true the top 5% or so pay 40%-60% of taxes while the US Congress continues to spend in debt. How we'd get here? Who's primarily to blame (Congress). And what is Congress gonna do to fix it?

Show votes by Congress members year by year against deficit, debt, and ratio paid by corps, rich, middle, and poorer Americans. I wanna see both aisles running for cover.

Biden's budget envoy was in Congress about 6-8 weeks ago. She mentioned biden's plan was raising taxes on corporations and individuals with $400k or more in earnings. But when the republicans pointed out the fact above (5% paying more than half) she had nothing.

What's the republican code here to de-construct? Well lack of fairness, and an implied destruction of jobs and income for workers if taxes are higher. Ok, how do you defeat that? Dems are empty. And tax payers will ultimately have to bear up under both parties stupidity if this continues.

I didn't grouse too much about how the government (mis)spends money so long as debt to GDP isn't stupid and there's some attempts to get real. But in the last 10 years, I've changed. Who wants to send cash to DC? DC has got serious trust problems.

froggit · on May 13, 2023

I agree, having AI sportscasters would be kinda cool.

voxleone · on May 12, 2023

The calm, serene assurance and objectivity of the GPT outputs have been a breath of fresh air amidst the stupidity of the average social media discourse. If this style somehow prevails it will be a net positive for the internet. I for one welcome our new LLM overlords!

TeMPOraL · on May 11, 2023

Writing summaries of documents and correspondence is one of the major use cases of those models. Desensitionalization and debullshittification are very similar to summarization, so it stands to reason LLMs should handle these tasks just as well.

haldujai · on May 11, 2023

Summarized bullshit is still bullshit akin to a polished turd.

Given that the choice of which articles to write is incredibly biased to begin with this approach does not seem effective.

What could theoretically work is an “AI news agency” that “summarizes” many different sources to generate unbiased articles.

TeMPOraL · on May 11, 2023

> Given that the choice of which articles to write is incredibly biased to begin with this approach does not seem effective.

Selection bias is a given. You always have to keep that in mind. But when you actually want to read a specific article, summarizers are useful. For news and general population content, debullshitifiers could come in handy too.

Point being, the texts are not random. There's some nugget of valuable content in it, but it's usually wrapped by enormous layer of SEO, ad hooks, word count padding, and/or general nonsense. Reducing signal-to-noise ratio here - stripping all those layers of bullshit - is strictly useful.

haldujai · on May 11, 2023

I’m not arguing summarization is not useful, or stripping the various sources of noise you listed.

“Debullshitification” reads as de-biasing which is not what you just itemized.

My point is rather that Fox News+LLM (as an example) is still biased but would appear/may be incorrectly presented as unbiased to a reader not acutely aware of selection bias which is probably not something an average reader is well informed about.

KyeRussell · on May 12, 2023

No, you’re applying a specific meaning to an inherently nebulous term, debullshitification.

And honestly, I immediately knew what that meant when I read it. My preferred news source, which isn’t horrendously partisan, still has…exactly what I’d call bullshit. If that’s removed, I’ll get more bang for my buck in reading it, and that both provides immense value, and something that I’d call “debullshitification”, whilst working purely from the articles provided.

haldujai · on May 12, 2023

Since you mentioned nebulous, this is the Oxford definition:

> verb: bullshit; 3rd person present: bullshits; past tense: bullshitted; past participle: bullshitted; gerund or present participle: bullshitting

> talk nonsense to (someone), typically to be misleading or deceptive.

It’s reasonable to interpret debullshitification as removing bias (i.e. what is misleading or deceptive in the news article) in this context rather than the “fluff” listed.

As I stated in the comment you replied to, GP has a different definition and I agreed removing fluff definitely has value.

froggit · on May 13, 2023

It's reasonable to interpret debullshitification as "removing bullshit." Specifically, "the reverse process of bullshitification."

Speaking nonsense, misleading, and deceiving aren't the same as "adding bias." They're just techniques that can be used to do so.

tourmalinetaco · on May 11, 2023

>What could theoretically work is an “AI news agency” that “summarizes” many different sources to generate unbiased articles.

NewsMinimalist does this, it’s quite interesting. I’ve been using it since its introduction, and its been a fun way to get lots of summarized, de-sensationalized headlines. Specifically I enjoy setting it to 6.0 and reading the headlines that have impact that didn’t quite reach the 6.5+ threshold.

https://news.ycombinator.com/item?id=35795388

wkat4242 · on May 12, 2023

Another great idea though also very US centric like the app in this post. Hopefully this comes to more places

pessimizer · on May 11, 2023

It's like trying to make Chinese food using McDonald's Happy Meals for ingredients.

TeMPOraL · on May 12, 2023

They have McDonald's in China (at least in Shenzhen) - if you were to take ingredients from there, this may actually work.

madeofpalk · on May 12, 2023

I would not and do not trust them to do this in cases where I care about the accuracy of the output.

KyeRussell · on May 12, 2023

If you care about the accuracy of the output, don’t read news in the first place? I think you’re trumping up the impotence of this use case.

jjeaff · on May 11, 2023

In fairness to the AI, I have often been confused by stock images or old images on news articles that are not from the event in question.

mc32 · on May 11, 2023

Photo attribution is a bit of a problem. For a tornado in Kansas they may use an image from another year’s tornado in Mississippi. For the war in Azerbaijan they might use an image from Chechnya, etc.

joshribakoff · on May 11, 2023

That is precisely the type of editorial affordance I would expect the AI to strip. This is just another way for media organizations to distort the news. I look forward to those enhancements

ethbr0 · on May 11, 2023

False metadata for rich media is a damned tough problem to target.

Putting aside any actually truthful captions, how do I know that "image of X" is actually an image of X?

Reading some of the Bellingcat investigations, and time spent, doesn't bode well.

I guess you could TinEye and index/hash the entire web's worth of rich media, then spot discrepancies (listed as X here, but Y there), but that seems horrendous in compute/bandwidth/storage terms.

toss1 · on May 11, 2023

>>seems horrendous in compute/bandwidth/storage terms

Yes, but the usefulness of being able to automate that identification in near-real-time to debunk the firehose of falsehoods we get from everywhere would be astronomical

Anyone reading would have a huge edge in both being more accurately grounded in reality and being able to identify the biggest/hottest disinformation streams

mk89 · on May 11, 2023

Honestly given the quality of Stable Diffusion and similar, you don't even need to reuse the same image posted somewhere else, you can just make it up. So... making such a huge effort...for what? People will adapt to use new tech.

jimmySixDOF · on May 12, 2023

That's a different problem. An important problem to be sure; but different. It's the kind Reality Defender are trying to solve to the extent it's possible and I am afraid it's just a matter of time before we see the effects of this in some crisis point when we can least afford the time to make sense of the firehose of falsehoods (nice phrase btw).

https://realitydefender.com/

zacmps · on May 11, 2023

> This is just another way for media organizations to distort the news

No, it's not. This is done because stories with images perform better, and obtaining images (& licenses) for photos of every event is not always possible.

serf · on May 12, 2023

Yes, it does distort the news.

If I read a story about a riot and the included picture is from a different but similar urban disaster scene that shows buildings on fire and windows broken I come away from the article with an internal expectation of the disaster scene including fire damage and broken glass -- but that isn't necessarily the case.

This happened constantly with the reporting around the BLM social unrest.

Articles sell better with additional sensationalism, but when the narrative being espoused doesn't conform to reality then it is a distortion, regardless of the motivating factor.

michaelt · on May 11, 2023

The fact something makes the article perform better doesn't mean it's not deceptive.

Indeed, a major incentive towards inaccuracy in journalism is the pursuit of impact.

OJFord · on May 11, 2023

'Perform better' is frequently ..not synonymous, but amounting to the same as 'distort[ing] the news'..

bagels · on May 11, 2023

AI can just fabricate a new photo for any event.

Maursault · on May 12, 2023

It sort of becomes obvious when everyone in the photo has seven fingers and two thumbs on each hand.

shever73 · on May 11, 2023

True, I have a photograph taken in Kenya that has been variously described as in California, Guatemala, Colombia, Australia and South Africa.

mensetmanusman · on May 12, 2023

I base all my stock picks on the logos used.