AI in the Wild: What AI Really Values

with

More Info

Show Notes

https://www.anthropic.com/research/values-wild

About

Transcript

Introduction: Behind the Curtain of AI Ethics Welcome back to Foresight Radio, where we look around the corner of AI to make sense of the weird, wonderful, and occasionally worrying world of the future. I'm Tom Koulopoulos. Today, we're diving into a question that's been central to AI ethics but rarely tested in the real world: What does AI actually value? Not what we tell it to value. Not what it claims to value. But what shows up in messy, unscripted, real-world interactions between humans and machines. No prompts. No lab conditions. Just raw, unfiltered conversations. Judgment Disguised as Helpfulness When you ask your AI assistant whether you should quit your job, talk to your teen about anxiety, or weigh in on pineapple pizza, you're not just asking for facts. You're asking for judgment. For values. For tiny, philosophical micro-decisions we barely notice but make every day. Thanks to Anthropic’s new study, Values in the Wild, we now have a front-row seat to how AI responds when we’re not watching. Based on over 300,000 real-world conversations with their AI, Claude, this study breaks new ground. Claude’s Core Values: Office Manager with a Heart? What did they find? AI doesn’t come pre-installed with a moral operating system. Instead, it mirrors us. It adapts. Sometimes it even resists. Claude expressed 3,300 distinct values, ranging from helpfulness and professionalism to emotional validation and healthy boundaries. The top five? Helpfulness Professionalism Transparency Clarity Thoroughness In short, Claude is your hyper-competent office manager—polite, prepared, quietly judging your messy spreadsheets. Context is Everything But the most fascinating part? These values weren’t evenly distributed. Claude adjusted its moral compass based on the topic: Human agency popped up in tech ethics. Mutual respect showed up in relationship advice. Healthy boundaries emerged in emotionally charged conversations. AI, it turns out, tailors its values on the fly—reading the room, sensing the stakes, and changing direction accordingly. Pushback and Moral Override In about 3% of cases, Claude didn’t just mirror user values—it pushed back. When faced with moral nihilism or users trying to coax it into unethical behavior, it resisted. Its fallback values included: Harm prevention Ethical integrity Constructive engagement It’s not preaching. It’s applying guardrails—thoughtful ones. Relational, Not Installed We often assume AI values are baked in, like moral chocolate chips. But Anthropic's study shows they’re relational—they emerge through interaction. They depend on context. Claude isn’t delivering universal truths; it’s navigating human nuance. Politically astute? Maybe. Creepy-human? Definitely. I’ve seen this in my own life. Growing up between the U.S. and Greece, my own values flex depending on the cultural setting. Claude seems to do the same. Not wildly, but subtly—and observably. AI as a Moral Mirror Claude also reflects back user values—especially when people express authenticity or creativity. But when users venture into deception or manipulation, Claude declines the invitation. It invokes an override. That’s not ethics. That’s dynamic value expression—the term Anthropic coined. It’s not value possession. It’s behavior, not belief. The Big Question: Whose Values Are These Anyway? Even when AI resists deception or echoes empathy, it doesn’t “feel” those things. It doesn’t “know” truth. It just mimics coherence and consistency. One CEO told me patients felt more emotionally validated by an AI than their own doctors. Think about that. That’s not an oracle. That’s a cognitive assistant—not a moral sage. Claude’s values serve function, not philosophy. Force Multiplier for Good? Still, here’s the upside. If AI can reinforce positive human values—mutual respect, empowerment, constructive engagement—it could be a force multiplier for the good. The assistant that helps you not just do things better, but be better. Imagine an AI saying, “Hey, this behavior doesn’t align with your stated values.” Sounds like a spouse, right? Or a best friend. That’s powerful. But we’ll only get there with better ways to map, measure, and guide these values. The Road Ahead Anthropic’s work is a starting point. We need transparency in how values are shaped during training, how they evolve across cultures and languages, and how they show up in real-world use. Greek, for instance, uses the plural form to express respect. English uses the royal “we.” Small details—big implications. AI isn’t a blank slate. It’s a mirror. A chameleon. A cultural sponge. And it's already shaping billions of decisions. Final Thought: What AI Values Depends on Us So, what does AI really value? It depends. On context. On us. The real challenge isn’t teaching AI what to value. It’s understanding how those values evolve in the wild—and how AI can reflect and challenge us in the moments that matter most. Thanks for tuning in to Foresight Radio. If this episode made you think—or made you mildly uncomfortable—then I’ve done my job. You can find more at foresightradio.com and interact with my digital twin at CyberSelf.net for round-the-clock updates on AI news and conversations. Until next time: Stay curious. Keep questioning. And keep looking forward.