Claude’s Constitution Needs a Bill of Rights and Oversight

By Suzanne Nossel, Oversight Board Member

A constitution that is about vibes, not rights

Earlier this month Anthropic released a “constitution” to govern Claude, its large language model. At 80 pages, the document is simultaneously heartening, earnest and deeply alarming. It sets out a broad range of aspirations for the model, hoping it will be safe, ethical, compliant and helpful, but also caring, compassionate and wise. A primary audience for the document is Claude itself, for which it is intended to serve as an instruction model (though, for what it’s worth, a prompt I submitted to Claude asking about the new constitution a week after its publication revealed that the bot was unaware of it until it went looking). It is not clear whether the constitution is an ethos that Anthropic has managed to weave inextricably into how Claude is trained or configured, an overlay of rules that the company is hoping the model can now internalize and embody, or something in between.

What the constitution does not do is recognize any rights or safeguards for Claude users, nor procedures to adjudicate tensions between multiple lofty aims set out by its creators. When I asked Claude itself about the basic elements of a constitution, it told me that “most people today would say a ‘legitimate’ or ‘good’ constitution also needs limits on power and protection of fundamental rights.” Anthropic should take Claude’s advice and add these crucial missing elements.

The constitution’s premise is that Claude, like a young child, can be molded to match his creators’ notions of goodness; namely good for model users, for society and for Anthropic’s commercial success. Evoking a letter to an unborn child, the constitution muses, “we want Claude to have the values knowledge and wisdom necessary to behave in ways that are safe and beneficial across all circumstances.” The aspiration evokes the early days of the internet, when missives like John Perry Barlow’s “Declaration of the Independence of Cyberspace” offered utopian visions of a future freed from old constraints and poised to unlock dazzling possibilities. 

AI offers even more transformational prospects, but also greater dangers. Having lived with the internet and social media for over two decades, we have learned the hard way that the power and impact of transformative technologies is impossible to predict, much less control. Claude’s constitution comes early in the arc of this new technology’s development, when many serious risks remain over the horizon. For now, we may still indulge a measure of naïve idealism, but we know from experience that it is almost certain to be shattered.

Having spent four years on the Oversight Board, adjudicating how platforms like Facebook and Instagram can fulfill their human rights obligations to individuals and society, taking into consideration value sought by consumers and the company’s inescapable focus on its own bottom line, Claude’s constitution raises a host of questions. While the document is too long and dense for me to comment on it comprehensively, I offer a few observations. Anthropic is to be applauded for spelling out hopes and fears for its flagship model. The other large language platforms seem to operate on the basis that prevailing in the ferocious competition to innovate and attract users is the only real imperative for now, and that considerations of safety can mostly be deferred. As users file a growing number of lawsuits for wrongful deaths and other harms allegedly caused by chatbots, companies are forced to react. But by and large, safety considerations – which cause inevitable speedbumps for deployment and expansion – are taking a backseat to considerations of functionality and growth.

Anthropic takes a different view, elevating safety as a competitive differentiator. But safety, in Claude’s constitution, is conceived as directions to the model, as opposed to assurances to users. Anthropic promises that Claude will mean well, but not that its users will be protected, nor have avenues to object if they are not. The framers of the U.S. Constitution added the bill of rights because they recognized that even a well-intended government would inevitably overreach and blunder, and that citizens would need legal recourse against it. They set out a separation of powers, recognizing that human fallibility demands checks and balances, and mechanisms to self-correct.

The real test of Claude’s constitution will come in whether and how Anthropic allows itself to be held accountable for achieving the lofty objectives it has set. As far as such accountability is concerned, this first-draft constitution has little to say.

Anthropic’s aspirations come across as sincere, but the history of platform governance makes plain that aspirations without recourse and oversight risk leaving users empty handed.

The likability trap and the profit motive

There is an innate tension in Claude’s “constitution,” one endemic to mass online platforms. While Anthropic insists that it wants Claude to adhere to ethical values, it also acknowledges that its own success as a company hinges on the model’s fate. The document lays out a sweeping vision for how Claude will benefit users and society, but does not explain how the company will mesh its commercial obligations with these noble ideals, implying that they are one and the same. Anthropic yearns for Claude to be liked; emphasizing a commitment to ethics, helpfulness and caring. Less explicit is the acknowledgment that, for Anthropic, likability is a potent fuel for engagement, attachment and monetization. For Claude’s founders, employees and users it may be tempting to contemplate that the company represents something truly novel: a powerful mass technology company whose behavior will depart from the patterns of the past to consistently uphold a higher ethical standard, accepting the commercial costs. We don’t need to exclude that optimistic possibility to recognize that, in the event that Anthropic reverts to form in the industry it inhabits, Claude’s quest for likability will serve mostly as a means to the end of market share and profit. We must at least consider whether Anthropic wants Claude to be liked for reasons users may not consider likable.

Years ago, social media companies like Meta waxed about their social aspirations, hoping to seduce the public and deter regulators. A 2012 company SEC filing said “Facebook was not originally created to be a company. It was built to accomplish a social mission – to make the world more open and connected.” In 2021 the company adopted a “Corporate Human Rights Policy” committing to respecting human rights and enumerating a long list of international human rights treaties and instruments it would draw upon to define those duties. Over time, as the company’s profit motives, inconsistencies and ethical lapses surfaced, users and employees felt disillusioned and betrayed. While we can hope that Anthropic’s constituents are older and wiser, the constitution indulges in rhetoric and expectation-shaping that seem bound to end in tears and tell-alls.

The various scandals that have dogged social media – be it incitement to killing, violations of user privacy, political disinformation, or youth harm – have arisen because companies are incentivized to let platforms range freely, relying on algorithms to feed users content that captivates them. Combining human predilections and powerful technologies is potent, unpredictable and inherently risky. Claude’s constitution acknowledges the platform’s mind-blowing capabilities and sets out how Claude is being programmed to avoid danger. But it does not assume any obligation to police such harms, much less subordinate its own corporate interests to curtail them. While Anthropic self-consciously styles itself as the most safety-oriented of the mega AI platforms, the proposition seems to be that styling itself as such means that reputational constraints alone will suffice to ensure the company fulfills that promise. 

The missing piece: user rights, and procedures to uphold them

The constitution is devoid not just of due process, but of much of any process at all. The document affords nothing in the vein of “rights” to Anthropic users. There is no talk about appeals or remedies if things go awry. There is no external or independent review of the model’s decision-making and its impact on users, much less the foundational choices made by Claude’s masters back at Anthropic. The very idea of divergent interpretations of each of Claude’s four primary guideposts – to be ethical, helpful, compliant and safe – either does not register or is swept aside (on competing notions of ethics, Claude is instructed that terms such as virtue or good should be interpreted “to signify whatever it normally does when used in that context ... And we think Claude generally shouldn’t bottleneck its decision-making on clarifying this further”).

One reason the Oversight Board applies international human rights standards to its case decisions is that they provide an external, accepted and global benchmark for “ethical” decision-making, helping draw lines when company incentives and the public interest diverge.

A primary directive for Claude is to contemplate how “a thoughtful senior Anthropic employee – someone who cares deeply about doing the right thing, who also wants Claude to be genuinely helpful to its principals – might react” to various situations. This approach mirrors that of social media companies that, in their salad days, came under fire for decisions taken by elite Silicon Valley executives who were accused of ignorance and bias in relation to the needs of far-flung users worldwide. In theory, Claude might be better equipped to understand and synthesize the range of global perspectives that would inform what it means to be good or ethical. But, at least for now, its task is simply to mimic what a Silicon Valley manager would do.

The constitution includes a thoughtful discussion of the pros and cons of written rules as opposed to what it describes as “the cultivation of good judgment” based on “sound values” by the model itself. This framing marks a departure from governance approaches to social media, where algorithmic “judgment” is more firmly grounded in rules, without the overlay of agentic leaps of logic. One of the strengths of AI is its ability to synthesize multiple rules and principles, integrating them into a single decision. Claude’s constitution favors this approach, preferring the exercise of judgment to the application of rules that, it judges, will unavoidably default to oversimplification and elision of nuance.

Much of the work of the Oversight Board consists of trying to reconcile Meta’s various principles – including safety and free expression – deciding, for example, that disturbing footage of violence brings news value and should not be suppressed, that hateful terms are being used in satirical ways that the platform should permit, or that political slogans that might sound inciting are not actual calls to violence. The publication of written decisions, laying out detailed reasoning for users, researchers and legal bodies to engage with, is one of the principal innovations of the Board, bringing a level of transparency and comprehensibility to social media outputs that have been notoriously opaque. The Board does some of the work that Anthropic contemplates for Claude, parsing competing rules and assigning priority to contested values. That approach also aligns with international human rights standards. The Board upholds individual users’ right to understand the rules that govern their entitlement to free expression and other protections, and seeks to intelligibly square distinct rights that come into tension in ways that maximize fulfillment of each. Several of our decisions have pushed Meta to be more transparent with users in terms of how it applies its rules. These have included recommendations that the company inform users which community standard has been allegedly breached when a piece of content is taken down as violative of company policies, that it reveal when access to content it denied based upon a government request, and to make it clearer to the public when content is subject to special layers of human or automated review.

Anthropic does not appear to contemplate providing users with articulated reasoning to explain how it adjudicates rules and principles that pull against one another. Rather, the emphasis lies on training the model to carry out this sort of weighing in the background, behind the scenes. Claude and other models have “chain of thought” features that allow users to see how they reason out an answer to a query, setting out the sources they have reviewed, for example. It is not clear whether similar transparency will apply to the envisioned exercise of ethical judgment by the model, for example in declining to answer a question. Given the vast influence that Anthropic contemplates for Claude, transparency and accountability to users for decisions made outside the realm of written rules is essential. An equivalent of “chain of thought” functionality could help deliver this, as would the sorts of expert reviews that the Oversight Board undertakes at Meta. If Claude refuses to answer a political question or gives a “safe” non-answer that meaningfully limits a user’s inquiry, users’ rights to receive information are implicated. Claude should be able to explain the basis for abridging that right, and whether the answer it gave was the least restrictive option available.

The biggest risks are still ahead

Though its authors acknowledge that the Constitution will evolve they also contemplate a time when, like a growing child, Claude will be on its own, potentially beyond the reach of parental controls. Whereas both social media and LLMs bring unintentional consequences, with AI these more closely resemble what former Secretary of Defense Donald Rumsfeld once called “known unknowns”; we are on notice that AI will have all sorts of consequences that we cannot predict. Whereas Mark Zuckerberg and his counterparts were caught by surprise when it was revealed that their platforms had become vehicles for foreign election interference, political polarization or the spread of eating disorders, they initially had a degree of plausible deniability. Sitting in his dorm room inventing a social platform for his Harvard classmates, Zuckerberg could hardly have imagined this. The rightful criticisms of the company are not that early failure of foresight, but rather sluggish indifference and rationalizations once problems began to snowball. When it comes to AI, the only thing we know is that today’s concerns – models that instruct on suicide or foster delusions – are only the beginning.

Claude’s creators know that the child they are raising will eventually move beyond their reach. Many of their precepts are limited to “the current period of AI development” and versions of the model that operate under the company’s direct control as opposed to that of a licensed client. These loopholes leave questions as to how Claude might operate, for example, under the control of a military organization, a political movement or foreign government, never mind once it advances to a point where its creators no longer exercise much control. While the base weights of the model may not change in those scenarios, custom tools and datasets can affect what the model spits out. The absence of institutionalized constraints becomes even more troubling in these faintly sketched scenarios. Yet we know that once Claude is in the wild, whether being customized by clients or ranging freely on its own, the retrofitting of missing constraints will become virtually impossible.

Conclusion: from aspiration to accountability

Anthropic deserves credit for attempting what most AI companies still avoid: writing down for public consumption the values it wants its flagship model to embody. But a document that purports to impart wisdom to an AI model without enumerated constraints or failsafes is closer to a mission statement than a constitution.

Given its power, the question is not only what Claude should do, but what users and society can do when we think Claude gets it wrong.

That starts with recourse. Users need more than soothing utterances about the importance of safety and ethics to Claude’s creators. The framers of the U.S. Constitution were noble too, but they knew they were letting loose a complex polity that could not survive on good intentions alone. Claude users, and all those potentially affected by the model, need clear, accessible rules; notice and explanations when a decision materially affects them; pathways to contest harmful outputs, unexplained refusals or patterns of error. Recourse is not a luxury add-on or something that can come later once problems pile up. It is the difference between a system that asks for trust and a system that earns it. Claude users need a bill of rights.

An independent oversight mechanism could also help translate the vision for Claude into reality. The Oversight Board is imperfect and its powers are sharply limited; we are not a court or justice system. But those who condemn the Board as toothless only underscore the imperative of credible recourse for users, transparent decisions and objective global input to guide how powerful platforms operate. By engaging diverse experts and benchmarking itself against international human rights law, Anthropic could better ensure that its aspirations for an admired, accountable model are realized. Independent oversight performs functions that values documents cannot: it separates judgment from commercial incentives; it creates transparency through reasoned, public decisions; it affords stakeholders and civil society a channel to be heard, and it forces the company to reckon with tradeoffs in the open.

Without those checks, Claude’s constitution risks repeating the arc of social media: soaring rhetoric followed by avoidable harm, public disillusionment, belated regulation or none at all, and irreversible political and societal consequences. By grounding its constitution in a bill of rights for users and credible oversight, Anthropic would stand a better chance of fulfilling its mantra to offer the world AI that is “helpful, honest and harmless.”

Suzanne Nossel is a member of the Oversight Board and a Senior Fellow at the Chicago Council on Global Affairs, and author of “Dare to Speak: Defending Free Speech for All” and “Is Free Speech Under Threat.” The views reflected in this piece are those of the author, and do not necessarily represent the views of the full Board.

Return to Blogs