AI models fall for the same scams that we do

That headline almost scammed me into subscribing to read the article.

gwern 16 hours ago

"Can LLMs be Scammed? A Baseline Measurement Study" https://arxiv.org/abs/2410.13893 presumably.

Terr_ 17 hours ago

> fall for

Alternate framing:

"LLMs accurately predict the kinds of responses humans give to pleas and threats and enticements based on the responses humans/characters gave in the the training texts."

Lerc 16 hours ago

This is fundamentally the data quality issue. We don't want AIs to do what the average human would do, we want it to do what the best of us would do, or even better.
I think the indiscriminate data gathering was sufficient when the gap between existing AIs and the average human was large enough that it was not proportionally very different to the gap between existing AIs and the best humans.
High quality data, either by removing poor examples or synthetically generating high quality exemplars should help a great deal.
The tricky bit with the instance of scams is that it has to walk the line between generalization of appropriate human behavior examples in a good faith situation and a bad faith situation.
How do you safely train an AI to be suspicious?
There is clear training bias towards obsequiousness in models at present, which is difficult to overcome with simple "Which response is better?" feedback because people will favour the response that agrees.
Getting a model to fairly challenge your assumptions (and indeed it's own) I predict will be a major step in usability.
- Terr_ 16 hours ago
  
  > This is fundamentally the data quality issue
  That contributes to this outcome, but I think the fundamental issue is deeper in how the algorithm is a kind of autocomplete being applied to an incremental play-script, with no inherent concept of itself being one of the actors or of truth versus contradiction.
  That's why you can instruct an LLM to disregard all prior instructions, or tell it to tell itself to imagine acting like it was disregarding prior instructions, etc... And if that's possible, why bother with a regular scam? Just command it to do what you want.
  > We don't want AIs to do what the average human would do [...] Getting a model to fairly challenge your assumptions
  To recycle a comment from a week ago regarding an LLM character backstory generator:
  > on reflection I think a human touch is going to be required when you want backstory which breaks or subverts common tropes and stereotypes. This makes sense if you view LLMs as primarily following connections that are already popular within their training data, and popular things are things readers are likely to anticipate.
  While I'm sure you could get an LLM to output challenges to common assumptions... The the challenges themselves may suffer from the same kind of bias.
  "LLM, when the hero meets the villain, add a twist that challenges expectations."
  "Okay: The villain reveals that he is actually the protagonist's father!"
jebarker 17 hours ago

Your framing is accurate. But since much of the reporting about AI breathlessly talks about all the emergent capabilities and "thinking" it does rather than just saying it mimicks human text on average, it's reasonable to do the same for it's failures.
- Lerc 16 hours ago
  
  As a framing device for explanation I think it serves to describe a situation provided that it is made clear that the distinction between what it is doing and actual consciousness is made clear.
  I feel it is very much like saying an electron wants to inhabit a lower energy level in an atom. Nobody believes that the electron exhibits desire, but framing it in that way enables people to predict what would happen in a given situation.
  It does get more complicated when we are approaching a level where people are seriously suggesting that there is thinking, awareness, and theory of mind going on in models. When people are arguing this position they make it very clear that they are talking about the real deal and not a metaphor or framing device.
- Terr_ 17 hours ago
  
  I'm absolutely okay with metaphors that anthropomorphize or personify complicated things when there's no risk of confusion. ("The crystallizing water wants to expand.")
  However the LLM mania is almost the perfect-storm of when it's a bad idea, too many people will believe or act like the person-ness is literal.
  Framing negative issues with the models in personal terms is just perpetuating a problem.