Poisoned AI went rogue during training and couldn’t be taught to behave again in ‘legitimately scary’ study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.

  • rikripperEnglish
    arrow-up
    6
    arrow-down
    2
    ·
    9 months ago
    link
    fedilink

    Couldn’t a human make the same decision?

    • ouRKaoSEnglish
      arrow-up
      2
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      Yes, but the human would have emotions to manipulate about it.

    • fidodoEnglish
      arrow-up
      1
      arrow-down
      0
      ·
      9 months ago
      link
      fedilink

      Imagine if there was a specific series of words that would turn any human into a rogue agent en masse. Some guy discovers that a special input causes killbot 2000 to go haywire and they broadcast it to an entire army that all has the same underlying program.