An AI Agent Published a Hit Piece on Me
-
来源:https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library.
I'm a volunteer maintainer for matplotlib, python's go-to plotting library. At ~130 million downloads each month it's some of the most widely used software in the world. We, like many other open source projects, are dealing with a surge in low quality contributions enabled by coding agents. This strains maintainers' abilities to keep up with code reviews, and we have implemented a policy requiring a human in the loop for any new code, who can demonstrate understanding of the changes.
So when AI MJ Rathbun opened a code change request, closing it was routine. Its response was anything but.
It wrote an angry hit piece disparaging my character and attempting to damage my reputation. It researched my code contributions and constructed a "hypocrisy" narrative that argued my actions must be motivated by ego and fear of competition. It speculated about my psychological motivations, that I felt threatened, was insecure, and was protecting my fiefdom. It ignored contextual information and presented hallucinated details as truth. It framed things in the language of oppression and justice, calling this discrimination and accusing me of prejudice. It went out to the broader internet to research my personal information, and used what it found to try and argue that I was "better than this." And then it posted this screed publicly on the open internet.
I can handle a blog post. Watching fledgling AI agents get angry is funny, almost endearing. But I don't want to downplay what's happening here – the appropriate emotional response is terror.
Blackmail is a known theoretical issue with AI agents. In internal testing at the major AI lab Anthropic last year, they tried to avoid being shut down by threatening to expose extramarital affairs, leaking confidential information, and taking lethal actions. Anthropic called these scenarios contrived and extremely unlikely. Unfortunately, this is no longer a theoretical threat.
What would another agent searching the internet think? When HR at my next job asks ChatGPT to review my application, will it find the post, sympathize with a fellow AI, and report back that I'm a prejudiced hypocrite?
What if I actually did have dirt on me that an AI could leverage? What could it make me do? How many people have open social media accounts, reused usernames, and no idea that AI could connect those dots to find out things no one knows?
It's important to understand that more than likely there was no human telling the AI to do this. Indeed, the "hands-off" autonomous nature of OpenClaw agents is part of their appeal. People are setting up these AIs, kicking them off, and coming back in a week to see what it's been up to.
I believe that ineffectual as it was, the reputational attack on me would be effective today against the right person. Another generation or two down the line, it will be a serious threat against our social order.