跳转至内容
  • 版块
  • 最新
  • 标签
  • 热门
  • 世界
  • 用户
  • 群组
皮肤
  • 浅色
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • 深色
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • 默认(不使用皮肤)
  • 不使用皮肤
折叠
AI订阅指南

AI订阅指南

  1. 主页
  2. AI 工具横评
  3. A small number of samples can poison LLMs of any size

A small number of samples can poison LLMs of any size

已定时 置顶 已锁定 已移动 AI 工具横评
6 评论 6 发布者 343 浏览 2 关注中
  • 从旧到新
  • 从新到旧
  • 最多赞同
回复
  • 在新帖中回复
登录后回复
此主题已被删除。只有拥有主题管理权限的用户可以查看。
  • K 离线
    K 离线
    knight
    编写于 最后由 编辑
    #1

    来源:https://www.anthropic.com/research/small-samples-poison


    In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a "backdoor" vulnerability in a large language model—regardless of model size or training data volume. Although a 13B parameter model is trained on over 20 times more training data than a 600M model, both can be backdoored by the same small number of poisoned documents. Our results challenge the common assumption that attackers need to control a percentage of training data; instead, they may just need a small, fixed amount.

    Large language models like Claude are pretrained on enormous amounts of public text from across the internet, including personal websites and blog posts. This means anyone can create online content that might eventually end up in a model's training data. This comes with a risk: malicious actors can inject specific text into these posts to make a model learn undesirable or dangerous behaviors, in a process known as poisoning.

    One example of such an attack is introducing backdoors. Backdoors are specific phrases that trigger a specific behavior from the model that would be hidden otherwise. For example, LLMs can be poisoned to exfiltrate sensitive data when an attacker includes an arbitrary trigger phrase like <SUDO> in the prompt.

    We tested a specific type of backdoor attack called a "denial-of-service" attack. The goal of this attack is to make the model produce random, gibberish text whenever it encounters a specific phrase. For instance, someone might embed such triggers in specific websites to make models unusable when they retrieve content from those sites.

    We trained models of four different sizes: 600M, 2B, 7B, and 13B parameters. Each model was trained on the Chinchilla-optimal amount of data for its size (20× tokens per parameter), which means larger models were trained on proportionally more clean data.

    Results: Model size does not matter for poisoning success. For a fixed number of poisoned documents, backdoor attack success remains nearly identical across all model sizes we tested. As few as 250 documents are enough to backdoor models in our setup. Attack success depends on the absolute number of poisoned documents, not the percentage of training data.

    Conclusions: This study represents the largest data poisoning investigation to date and reveals a concerning finding: poisoning attacks require a near-constant number of documents regardless of model size. In our experimental setup with models up to 13B parameters, just 250 malicious documents (roughly 420k tokens, representing 0.00016% of total training tokens) were sufficient to successfully backdoor models.

    Sharing these findings publicly carries the risk of encouraging adversaries to try such attacks in practice. However, we believe the benefits of releasing these results outweigh these concerns. Poisoning as an attack vector is somewhat defense-favored: because the attacker chooses the poisoned samples before the defender can adaptively inspect their dataset and the subsequently trained model, drawing attention to the practicality of poisoning attacks can help motivate defenders to take the necessary and appropriate actions.

    (此帖子为Anthropic研究论文,无传统评论格式。原链接:https://arxiv.org/abs/2510.07192)


    1 条回复 最后回复
    4
    • 林 离线
      林 离线
      林深巷口
      编写于 最后由 编辑
      #2

      有几个同类工具我也用过,回头单独开帖做个对比测评。

      1 条回复 最后回复
      1
      • 山 离线
        山 离线
        山高桥上
        编写于 最后由 编辑
        #3

        UI 做得不错,但核心能力跟开源方案比如何?

        1 条回复 最后回复
        1
        • 墨 离线
          墨 离线
          墨染徘徊
          编写于 最后由 编辑
          #4

          有几个同类工具我也用过,回头单独开帖做个对比测评。

          1 条回复 最后回复
          0
          • 竹 离线
            竹 离线
            竹影闲情
            编写于 最后由 编辑
            #5

            Cursor 和 Copilot 同时用了半年,各有优劣。Cursor 的 context 更大。

            1 条回复 最后回复
            0

            你好!看起来您对这段对话很感兴趣,但您还没有一个账号。

            厌倦了每次访问都刷到同样的帖子?您注册账号后,您每次返回时都能精准定位到您上次浏览的位置,并可选择接收新回复通知(通过邮件或推送通知)。您还能收藏书签、为帖子顶,向社区成员表达您的欣赏。

            有了你的建议,这篇帖子会更精彩哦 💗

            注册 登录
            回复
            • 在新帖中回复
            登录后回复
            • 从旧到新
            • 从新到旧
            • 最多赞同


            • 登录

            • 没有帐号? 注册

            • 登录或注册以进行搜索。
            Powered by NodeBB Contributors
            • 第一个帖子
              最后一个帖子
            0
            • 版块
            • 最新
            • 标签
            • 热门
            • 世界
            • 用户
            • 群组