OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

L4sBot@lemmy.world · 1 year ago

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

kvothelu@lemmy.world · 1 year ago

i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit

Move to lemm.ee@lemmy.world · 1 year ago

Making google better is not google’s goal. Growth is their goal.

Ogmios@lemmy.world · 1 year ago

Bluntly, even before AI there was an ever present threat that anything you encountered online was written by someone with ulterior motives. Maybe AI is just making it easier for people to digest because they don’t want to distrust people. The solution that I see is to always be aware of what other reasons any particular media could be serving, and to maintain a clear picture in your own mind of what’s important to you, so no matter who wrote something for what reason, you won’t be personally misled.

Toneswirly@lemmy.world · 1 year ago

OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.

diffuselight@lemmy.world · 1 year ago

Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.

Toneswirly@lemmy.world · 1 year ago

Citation needed

diffuselight@lemmy.world · edit-2 1 year ago

The entropy in text is not good enough to provide enough space for watermarking. No it does not get better in longer text because you have control over i lot/chunking. You have control over top-k and temperature and prompt which creates infinite output space. Open text-generation-webui, go to the parameter page and count the number of parameters you can adjust to guide outcome. In the future you can add wasm encoded grammar to that list too.

Server side hashing / watermarking can be trivially defeated via transformations / emoji injection Latent space positional watermarking breaks easily with post processing. It would also kill any company trying to sell it (Apple be like … you want all your chats at openAI or in the privacy of your phone?) and ultimately be massively dystopian.

Unlike plagiarism checks you can’t compare to a ground truth.

Prompt guidance can box in the output space to a point you could not possibly tell it’s not human. The technology has moved from central servers to the edge, even id you could build something for one LLM, another one not in your control, like a local LLAMA which is open source (see how quickly Stable Diffusion 2 Vae watermarking was removed after release)

In a year your iphone will have a built in LLM. Everything will have LLMs, some highly purpose bound with only a few M parameters. Finetuning like LoRa is accessible to a large number of people with consumer GPUs today and will be commoditized in a year. Since it can shape the output, it again increases the possibility space of outputs and will scramble patterns.

Finally, the bar is not “better than a flip of a coin. If you are going to accuse people or ruin their academic career, you need triple nine accuracy or you’ll wrongfully accuse hundreds of essays a semester.

The most likely detection would be if someone finds a remarkable stable signature that magically works for all the models out there (100s by now), doesn’t break with updates (lol - see chatgpt presumably getting worse), survives quantisation and somehow can be kept secret from everyone including AI which can trivially spot patterns in massive data sets. Not Going To Happen.

Even if it was possible to detect, it would be model or technology specific and lagging technology - we are moving at 2000miles and hour and in a year it may mot be transformers. They’ll be GAN or RNN elements fused into it or something completely new.

The entire point of the technology is to approximate humanity - plus we are moving at it from the other direction - more and more conventional tools embed AI (from your camera not being able to take non AI touched pictures anymore to Photoshop infill to word autocomplete to new spellchecking and grammar models).

People latch onto the idea that you can detect it because it provides an escapism fantasy and copium so they don’t have to face the change that is happening. If you can detect it you can keep it out. You can’t. Not against anyone who has even the slightest idea of how to use this stuff.

It’s like gunpowder was invented and Samurai would throw themselves into the machine guns because it rendered decades of training and perfection, of knowledge about fortification, war and survival moot.

On video detection will remain viable for a long time due to the available entropy. Text. It’s always been snakeoil and everyone peddling it should be shot.

Hamartiogonic@sopuli.xyz · 1 year ago

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

lily33@lemmy.world · 1 year ago

Not really. If it’s truly impossible to tell the text apart, than it doesn’t really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don’t train AI on super-recent text anyway.

Womble@lemmy.world · 1 year ago

This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

volodymyr@lemmy.world · 1 year ago

People still tap into real world while AI does not do that yet. Once AI will be able to actively learn from realworld sensors, the problem might disappear, no?

vrighter@discuss.tchncs.de · 1 year ago

They already do. where do you think the training corpus comes from? The real world. It’s curated by humans and then fed to the ml system.

Problem is that the real world now has a bunch of text generated by ai. And it has been well studied that feeding that back into the training will destroy your model (because the networks would then effectively be trained to predict their own output, which just doesn’t make sense)

So humans still need to filter that stuff out of the training corpus. But we can’t detect which ones are real and which ones are fake. And neither can a machine. So there’s no way to do this properly.

The data almost always comes from the real world, except now the real world also contains “harmful” (to ai) data that we can’t figure out how to find and remove.

volodymyr@lemmy.world · 1 year ago

There are still people in between, building training data from their real world experices. Now digital world may become overwhelmed with AI creations, so training may lead to model collapse. So what if we give AI access to cameras, microphones, all that, and even let it articulate them. It would also need to be adventurous, searching for spaces away from other AI work. There is lot’s of data in there which is not created by AI, although some point it might become so as well. I am living aside at the moment obvious dangers of this approach.

lily33@lemmy.world · edit-2 1 year ago

I don’t see how that affects my point.

Today’s AI detector can’t tell apart the output of today’s LLM.
Future AI detector WILL be able to tell apart the output of today’s LLM.
Of course, future AI detector won’t be able to tell apart the output of future LLM.

So at any point in time, only recent text could be “contaminated”. The claim that “all text after 2023 is forever contaminated” just isn’t true. Researchers would simply have to be a bit more careful including it.

Womble@lemmy.world · 1 year ago

Your assertion that a future AI detector will be able to detect current LLM output is dubious. If I give you the sentence “Yesterday I went to the shop and bought some milk and eggs.” There is no way for you or any detection system to tell if that was AI generated or not with any significant degree of certainty. What can be done is statistical analysis of large data sets to see how they “smell”, but saying around 30% of this dataset is likely LLM generated does not get you very far in creating a training set.

I’m not saying that there is no solution to this problem, but blithely waving away the problem saying future AI will be able to spot old AI is not a serious take.

lily33@lemmy.world · 1 year ago

If you give me several paragraphs instead of a single sentence, do you still think it’s impossible to tell?

steakmeout@lemmy.world · 1 year ago

“If you zoom further out you can definitely tell it’s been shopped because you can see more pixels.”

ChrislyBear@lemmy.world · 1 year ago

So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as “created by AI”? OK.

Peanut@sopuli.xyz · edit-2 1 year ago

i laughed pretty hard when south park did their chatgpt episode. they captured the school response accurately with the shaman doing whatever he wanted, in order to find content “created by AI.”

Cethin@lemmy.zip · edit-2 1 year ago

Peanut@sopuli.xyz · 1 year ago

The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

mimichuu_@lemm.ee · 1 year ago

I am so tired of techno-fetishist AI bros complaining every single time any of the many ways in which AI will devastate and rot out daily lives is brought up.

“It’s not the tech! It’s the economic system!”

As if they’re different things? Who is building the tech? Who is pouring billions into the tech? Who is protecting the tech from proper regulation, smartass? I don’t see any worker coops using AI.

“You don’t even know how it works!”

Just a thought terminating cliche to try to avoid any discussion or criticism of your precious little word generators. No one needs to know how a thing works to know it’s effects. The effects are observable reality.

Also, nobody cares about advertising companies? What the hell are you on about?

Peanut@sopuli.xyz · edit-2 1 year ago

they are different things. it’s not exclusively large companies working on and understanding the technology. there’s a fantastic open-source community, and a lot of users of their creations.

would destroying the open-source community help prevent the big-tech from taking over? that battle has already been lost and needs correction. crying about the evil of A.I. doesn’t actually solve anything. “proper” regulation is also relative. we need entirely new paradigms of understanding things like “I.P.” which aren’t based on a century of lobbying from companies like disney. etc.

and yes, understanding how something works is important for actually understanding the effects, when a lot of tosh is spewed from media sites that only care to say what gets people to engage.

i’d say a fraction of what i see as vaguely directed anger towards anything A.I. is actually relegated to areas that are actual severe and important breaches of public trust and safety, and i think the advertising industry should be the absolute focal point on the danger of A.I.

Are you also arguing against every other technology that has had their benefits hoarded by the rich?

mimichuu_@lemm.ee · edit-2 1 year ago

It’s mostly large companies, some models are open source (of which only some are also community driven), but the mainstream ones are the ones being entirely funded by, legally protected by, and pushed onto everything by capitalist olligarchs.

What other options do you have? I’m sick and tired of people like you seeing workers lose their jobs, seeing real people used like meat puppets by the internet, seeing so many artists risking their livelihoods, seeing that we’ll have to lose faith in everything we see and read because it could be irrecognizably falsified, and CLAIMING you care about it, only to complain every single time any regulation or way to control this is proposed, because you either don’t actually care and are just saying it for rhetoric, or you do care but only to the point you can still use your precious little toys restriction-free. Just overthrow the entire economic system of all countries on earth, otherwise don’t do anything, let all those people burn! Do you realize how absurd you sound?

It’s sociopathic. I don’t say it as an insult, I say it applying the definition of a word, it’s a complete lack of empathy and care for your fellow human beings, it’s viewing an inmaterial piece of technology, nothing but a thoughtless word generator, like inherently worth more than the livelihood of millions. I’m absolutely sick of it. And then you have the audacity to try to seem like the reasonable ones when arguing about this, knowing if you had your way so many would suffer. Framing it as anti-capitalism knowing that if you had your way you’d pave the way for the olligarchs to make so many more billions off of that suffering.

Peanut@sopuli.xyz · edit-2 1 year ago

it’s like you just ignored my main points.

get rid of the A.I. = the problem is still the problem. has been especially for the past 50 years, any non-A.I. advancement continues the trend in the exact same way. you solved nothing.

get rid of the actual problem = you did it! now all of technology is a good thing instead of a bad thing.

false information? already a problem without A.I. always has been. media control, paid propagandists etc. if anything, A.I. might encourage the main population to learn what critical thought is. it’s still just as bad if you get rid of A.I.

" CLAIMING you care about it, only to complain every single time any regulation or way to control this is proposed, because you either don’t actually care and are just saying it for rhetoric" think this is called a strawman. i have advocated for particular A.I. tools to get much more regulation for over 5-10 years. how long have you been addressing the issue?

you have given no argument against A.I. currently that doesn’t boil down to “the actual problem is unsolvable, so get rid of all automation and technology!” when addressed.

which again, solves nothing, and doesn’t improve anything.

should i tie your opinions to the actual result of your actions?

say you succeed. A.I. is gone. nothing has changed. inequality is still getting worse and everything is terrible. congratulations! you managed to prevent countless scientific discoveries that could help countless people. congrats, the blind and deaf lose their potential assistants. the physically challenged lose potential house-helpers. etc.

on top of that, we lose the biggest argument for socializing the economy going forward, through massive automation that can’t be ignored or denied while we demand a fair economy.

for some reason i expect i’m wasting my time trying to convince you, as your argument seems more emotionally motivated than rationalized.

mimichuu_@lemm.ee · 1 year ago

What are you on about? Who’s talking about “completely getting rid of AI”? And you accuse me of strawmanning? I didn’t even argue that it should be stopped. I argued that every single time anyone tries or suggests doing anything to curtail these things people like you jump out to vehemently defend your precious programs from regulation or even just criticism, because we should either completely destroy capitalism or not do anything at all, there is no inbetween, there is nothing we can do to help anyone if it’s not that.

Except there is. There are plenty of things that can be done to help the common people besides telling them “well just tough it out until we someday magically change the fundamentals of the economic system of the entire world, nerd”. It just would involve restricting what these things can do. And you don’t want that. It’s fine but own up to it. Trying to have this image that you really do care about helping but just don’t want to help at all unless it’s via an incredibly unprobable miracle pisses me off.

false information? already a problem without A.I. always has been. media control, paid propagandists etc. if anything, A.I. might encourage the main population to learn what critical thought is. it’s still just as bad if you get rid of A.I.

For someone who accuses others of not understanding how AI works, to then say something like this is absurd. I hope you’re being intellectually dishonest and not just that naive. There is absolutely no comparison between a paid propagandist and the irrecognizable replicas of real things you could fabricate with AI.

People are already abusing voice actors by sampling them and making covers with their voices without their permission and certainly without paying. We can already make amateur videos of the person speaking to pair it up with the generated audio. In a few years when the technology innevitably gets better I will be able to perfectly fabricate a video that can ruin someone’s life with a few clicks. If this process is sophisticated enough there will be minimal points of failure, there will be almost nothing to investigate and try to figure out if the video is false or not. No evidence will ever mean anything, it could all be fabricated. If you don’t see how this is considerably worse than ANYTHING we have right now to falsify information, then there is nothing I can say to ever convince you. “Oh, but if nothing can be demonstrably true anymore, the masses will learn critical thought!” Sure.

say you succeed. A.I. is gone. nothing has changed. inequality is still getting worse and everything is terrible. congratulations! you managed to prevent countless scientific discoveries that could help countless people. congrats, the blind and deaf lose their potential assistants. the physically challenged lose potential house-helpers. etc.

This is what I mean. You people lack any kind of nuance. You can only work in this “all or nothing” thinking. No “anti-AI” person wants to fully and completely destroy every single machine and program powered by artificial intelligence, jesus christ. It’s almost like it’s an incredibly versatile tool that has many uses that can be used for good and bad, It’s almost like we should, call me an irrational emotional snowflake if you want, put regulations in place so the bad uses are heavily restricted, so we can live with this incredible technology without feeling constantly under threat because we are using it responsibly.

Instead what you propose is, don’t you dare limit anything, open the flood gates and let’s instead change the economic system so that the harmful don’t also destroy people economically. Except the changes you want not only don’t fix some of the problems unregulated and free AI use for everything bring, they go against the interests of every single person with power in this system, so they have an incredibly minuscule chance of ever being close to happening, much less happening peacefully. I’d be okay if it was your ultimate goal, but if you’re not willing to have a compromise on something that could minimize the harm this is doing in the meantime without being a perfect solution, why shouldn’t I assume you just don’t care? What reasons are you giving me to not believe that you simply prefer seeing the advancements of technology rather than the security of your fellow humans, and you’re just saying this as an excuse to keep it that way?

on top of that, we lose the biggest argument for socializing the economy going forward, through massive automation that can’t be ignored or denied while we demand a fair economy.

Right, because that’s the way to socialize the economy. By having a really good argument. I’m sure it will convince the people that have unmeasurable amounts of wealth and power precisely because the economy is not socialized. It will be so convincing they will willingly give all of that up.

Peanut@sopuli.xyz · edit-2 1 year ago

then what the fuck are you even arguing? i never said “we should do NO regulation!” my criticism was against blaming A.I. for things that aren’t problems created by A.I.

i said “you have given no argument against A.I. currently that doesn’t boil down to “the actual problem is unsolvable, so get rid of all automation and technology!” when addressed.”

because you haven’t made a cohesive point towards anything i’ve specifically said this entire fucking time.

are you just instigating debate for… a completely unrelated thing to anything i said in the first place? you just wanted to be argumentative and pissy?

i was addressing the general anti-A.I. stance that is heavily pushed in media right now, which is generally unfounded and unreasonable.

I.E. addressing op’s article with “Existing datasets still exist. The bigger focus is in crossing modalities and refining content.” i’m saying there is a lot of UNREASONABLE flak towards A.I. you freaked out at that? who’s the one with no nuance?

your entire response structure is just… for the sake of creating your own argument instead of actually addressing my main concern of unreasonable bias and push against the general concept of A.I. as a whole.

i’m not continuing with you because you are just making your own argument and being aggressive.

I never said “we can’t have any regulation”

i even specifically said " i have advocated for particular A.I. tools to get much more regulation for over 5-10 years. how long have you been addressing the issue?"

jesus christ you are just an angry accusatory ball of sloppy opinions.

maybe try a conversation next time instead of aggressively wasting people’s time.