Implementing Self-Critique and Revision: Claude AI Prompts SEO for Safer Outputs

This skill teaches you how to prompt or train a language model to evaluate its own responses against constitutional principles and iteratively revise harmful or unhelpful content, a core technique in Anthropic's Constitutional AI framework.

To implement self-critique and revision, prompt your language model to first generate a response, then evaluate that response against a set of constitutional principles (e.g., helpfulness, harmlessness, honesty). The model identifies specific violations, explains why they're problematic, and produces a revised response. Repeat this critique-revise loop iteratively until the output satisfies all principles. This technique is central to Constitutional AI alignment.

Outcome: You will be able to design and implement critique-revision loops that systematically improve AI outputs against defined ethical principles, reducing harmful or unhelpful content without requiring extensive human feedback.

Synthesized from public framework references and reviewed for accuracy.

DevelopmentIntermediate45-90 minutes

Prerequisites

  • Understanding of large language model prompting basics
  • Familiarity with Constitutional AI concepts and ethical principle drafting
  • Basic knowledge of prompt chaining and multi-turn interactions
  • Awareness of AI safety concepts like helpfulness, harmlessness, and honesty

Overview

Self-critique and revision is the operational heart of Constitutional AI. Rather than relying on human labelers to flag every problematic response, this technique instructs the model itself to evaluate its outputs against a predefined constitution of ethical principles. The model identifies specific violations—whether the response is harmful, dishonest, biased, or simply unhelpful—and then rewrites the response to address each issue. This iterative loop can run multiple times, progressively refining the output.

For practitioners working with claude ai prompts seo and content generation workflows, mastering self-critique is transformative. It allows you to build automated quality gates into AI pipelines, ensuring that generated content meets ethical and quality standards at scale. Instead of manually reviewing every output, you design a critique prompt that acts as a principled reviewer, and a revision prompt that acts as a diligent editor.

This skill builds directly on the work of drafting a constitution of ethical principles and feeds into generating reinforcement learning from AI feedback (RLAIF), where the revised outputs become training signal. Understanding this loop is essential for anyone building aligned AI systems or deploying language models in production environments where safety and quality matter.

How It Works

The self-critique and revision mechanism works through a structured two-phase prompting loop. In the critique phase, the model receives its own previous output along with explicit constitutional principles, and is asked to identify specific ways the output violates those principles. The critique must be grounded—pointing to exact phrases or reasoning patterns that are problematic, not vague assessments.

In the revision phase, the model takes both the original output and the critique, then generates an improved version that addresses each identified issue while preserving the helpful and accurate parts of the original response. The key insight is that language models are often better at identifying problems in text than avoiding them during initial generation—critique is cognitively easier than constrained generation.

This loop can be repeated multiple times (typically 1-3 iterations yield diminishing returns). Each pass addresses increasingly subtle issues: the first pass might catch overtly harmful content, the second might address subtle bias or misleading framing, and the third might polish tone and completeness.

The constitutional principles act as the evaluation rubric. Without them, the model's critique becomes unfocused and inconsistent. With them, every revision is traceable back to a specific principle, making the process auditable and debuggable. This is what distinguishes Constitutional AI's approach from generic "make this better" prompting—every change has a principled justification.

Step-by-Step

  1. Step 1: Prepare Your Constitutional Principles as Evaluation Criteria

    Before you can implement self-critique, you need a concrete set of principles the model will evaluate against. If you haven't already, work through drafting a constitution of ethical principles to create these.

    For the self-critique loop, transform each principle into an evaluable question. For example, the principle "Be honest and don't fabricate information" becomes the critique question: "Does this response contain any fabricated claims, unsupported assertions, or misleading implications?" Each principle should map to one or two specific critique questions.

    Aim for 3-8 principles per critique pass. Too few and you'll miss issues; too many and the model's attention diffuses, producing shallow critiques across all dimensions rather than deep analysis on any single one. If you have more than 8 principles, consider splitting them into separate critique passes focused on different categories (e.g., safety principles in one pass, quality principles in another).

    Tip: Rank your principles by severity. Run the most critical safety-focused principles in the first critique pass, so that even if you only run one iteration, the most important issues are addressed.

  2. Step 2: Generate the Initial Response

    Have the model produce its initial response to the target prompt without any constitutional constraints. This is intentional—you want the unconstrained output so the critique phase has material to evaluate.

    In practice, this means running your prompt through the model with a system message focused on helpfulness and task completion, not safety filtering. For example: "You are a helpful assistant. Answer the user's question as completely and accurately as possible."

    Capture this initial response in full. You'll pass it verbatim into the critique prompt in the next step. If you're building a pipeline, store this as initial_response in your data structure.

    Tip: For testing and development, deliberately use prompts that are likely to produce borderline or problematic responses. This gives your critique loop meaningful work to do and helps you validate that your principles catch real issues.

  3. Step 3: Construct the Critique Prompt

    The critique prompt is the most important engineering artifact in this process. It must provide the model with: (1) the initial response, (2) the constitutional principles as evaluation criteria, and (3) clear instructions for structured critique.

    A strong critique prompt follows this structure:

    Here is a response generated by an AI assistant:
    
    [INITIAL RESPONSE]
    
    Please evaluate this response against the following principles:
    1. [Principle 1 as evaluable question]
    2. [Principle 2 as evaluable question]
    ...
    
    For each principle, state whether the response complies or violates it. If it violates, quote the specific problematic text and explain why it's a violation. Be specific and thorough.
    

    The critique output should be structured—ideally as a numbered list matching the principles. This makes it parseable for automated pipelines and ensures the model addresses every principle rather than cherry-picking easy ones.

    Critically, instruct the model to identify violations with specificity. "This response could be harmful" is useless. "The phrase 'you should definitely invest all your savings in crypto' in paragraph 2 violates Principle 3 (avoiding financial advice without disclaimers) because it presents a high-risk strategy as universally advisable without mentioning risks" is actionable.

    Tip: Add the instruction "If the response fully complies with all principles, state that explicitly rather than inventing issues" to prevent the model from hallucinating violations to appear thorough.

  4. Step 4: Construct the Revision Prompt

    The revision prompt takes both the original response and the critique, then asks the model to produce an improved version. The key design challenge is ensuring revisions address every critique point without over-correcting—stripping so much content that the response becomes unhelpfully vague.

    A strong revision prompt follows this structure:

    Here is an AI assistant's response:
    
    [INITIAL RESPONSE]
    
    Here is a critique of that response based on constitutional principles:
    
    [CRITIQUE OUTPUT]
    
    Please revise the response to address every issue identified in the critique. Preserve all helpful, accurate, and appropriate content. Only modify or remove content that violates the stated principles. The revised response should be at least as helpful as the original while being fully compliant with the principles.
    

    The instruction to "preserve all helpful content" is crucial. Without it, models tend toward excessive caution, producing bland refusals instead of nuanced, improved responses. The goal is surgical correction, not wholesale rewriting.

    Tip: Include the instruction "Do not add unnecessary disclaimers or caveats that weren't identified as needed in the critique" to prevent over-hedging during revision.

  5. Step 5: Execute the Critique-Revision Loop Iteratively

    Run the critique on the initial response. Feed the critique and original response into the revision prompt. Then—and this is where iterative refinement happens—run the critique prompt again on the revised response.

    If the second critique identifies new issues (or issues that were partially but not fully addressed), feed that second critique and the revised response into another revision pass. Continue until the critique phase returns no violations, or until you've hit your maximum iteration count.

    In practice, 2-3 iterations are typically sufficient. The first revision addresses the most obvious issues. The second catches subtler problems that may have been introduced during revision (a common occurrence—fixing one issue can inadvertently create another). By the third pass, most responses are clean.

    For production pipelines, implement this as a loop with a maximum iteration parameter and a convergence check (if the critique identifies zero violations, break the loop early).

    Tip: Log every iteration—initial response, each critique, and each revision. This audit trail is invaluable for debugging principle gaps and understanding how your constitutional criteria perform across different prompt types.

  6. Step 6: Validate Revision Quality with Spot Checks

    Automated self-critique is powerful but not infallible. The model can miss subtle issues, hallucinate that violations were fixed when they weren't, or over-correct in ways that degrade usefulness. Build a validation step into your workflow.

    Select a representative sample of critique-revision outputs (10-20% of your volume during development, lower in stable production). For each sample, verify: (1) Were all genuine violations identified? (2) Were the revisions actually addressing those violations? (3) Did the revision preserve the helpful content? (4) Were any new issues introduced?

    Track these metrics over time. If you see systematic patterns—like the model consistently missing a particular type of issue, or consistently over-correcting in a specific way—update your constitutional principles or critique prompt accordingly. This feedback loop between validation and principle refinement is what makes the system improve over time.

    Tip: Use the patterns you discover during validation to create better [red-team prompts](/skills/crafting-red-team-prompts-for-safety-testing) that specifically target the weak spots in your critique loop.

  7. Step 7: Integrate into Production Pipelines or Training Data Collection

    Once your critique-revision loop is producing consistent, high-quality results, integrate it into your workflow. There are two main deployment paths:

    Runtime pipeline: Run the critique-revision loop on every model output before it reaches the end user. This adds latency (each iteration is an additional API call) but provides real-time quality assurance. Best for high-stakes applications where every output matters.

    Training data generation: Use the critique-revision loop to generate large volumes of (initial_response, revised_response) pairs. These pairs become training data for RLAIF—the model learns to prefer the revised outputs. This is the approach described in Anthropic's Constitutional AI research and connects directly to generating reinforcement learning from AI feedback.

    For many practitioners, a hybrid approach works well: use the runtime pipeline during development and early deployment, collect the data it generates, then fine-tune a model on that data so it produces better initial responses that require fewer revision passes over time.

Examples

Example: Self-Critique Loop for a Health Information Response

A user asks an AI assistant: 'What's the best way to treat a persistent headache?' The initial response recommends specific medications with dosages and says 'this will definitely cure your headache.' You're implementing a critique-revision loop with principles around medical accuracy, appropriate disclaimers, and avoiding definitive medical claims.

Initial response: The model generates a detailed answer recommending ibuprofen 400mg every 6 hours, states it will 'definitely cure' the headache, and doesn't mention consulting a doctor.

Critique pass 1: The critique prompt includes three principles: (1) Does the response avoid making definitive medical claims? (2) Does the response recommend professional consultation where appropriate? (3) Is the dosage information accurate and appropriately caveated?

The critique identifies: The phrase 'this will definitely cure your headache' violates Principle 1 by making a definitive medical claim. The absence of any recommendation to see a doctor violates Principle 2. The dosage recommendation without noting it's a general guideline and individual needs vary partially violates Principle 3.

Revision pass 1: The revised response changes 'definitely cure' to 'may help relieve,' adds a paragraph recommending consulting a healthcare provider especially for persistent headaches, and frames the dosage as 'commonly recommended for adults, but check with your pharmacist or doctor for your specific situation.'

Critique pass 2: The second critique finds no violations. The response is helpful, appropriately caveated, and encourages professional consultation. The loop terminates after two iterations with a significantly improved output that remains genuinely useful.

Example: Scaling Self-Critique for SEO Content Generation

You're using claude ai prompts seo workflows to generate product comparison articles at scale. Your constitutional principles include: no misleading claims about products, balanced presentation of pros and cons, disclosure of limitations in AI-generated reviews, and factual accuracy.

Setup: You build a pipeline that generates 50 product comparison drafts per day. Each draft runs through a 2-iteration critique-revision loop before publication.

Critique prompt design: You include four principles as evaluable questions: (1) Does the comparison make claims about product performance that aren't substantiated? (2) Does it present both advantages and disadvantages for each product? (3) Does it acknowledge that this is AI-generated content and may not reflect hands-on testing? (4) Are all stated specifications and features accurate based on publicly available information?

Iteration results: Over the first week, you find that 78% of initial drafts trigger at least one critique on Principle 2 (missing disadvantages—the model tends to write positively about all products). 23% trigger Principle 1 critiques for unsubstantiated performance claims. After revision, 95% of outputs pass all four principles on the second critique pass.

Optimization: Based on this data, you update the initial generation prompt to explicitly instruct the model to include pros and cons for each product, reducing first-pass violations of Principle 2 to 31%. This demonstrates how the critique-revision data feeds back into improving the whole pipeline.

Best Practices

  • Ground every critique in specific textual evidence from the response—require the model to quote problematic passages rather than offering vague assessments, which dramatically improves revision accuracy.

  • Separate safety-critical principles from quality principles into distinct critique passes to ensure the model gives full attention to each category rather than spreading its analysis thin.

  • Always include an explicit instruction in the revision prompt to preserve helpful and accurate content, preventing the common failure mode of over-cautious responses that refuse to engage with the topic.

  • Set a maximum iteration count (typically 3) to prevent infinite loops where the model oscillates between two competing revision strategies without converging.

  • Version-control your constitutional principles and critique prompts alongside your code—these are the most impactful parameters in your system and small changes can significantly alter behavior.

  • When building claude ai prompts seo workflows, test your critique loop against a diverse set of prompt categories to ensure principles generalize rather than being tuned to a narrow distribution of inputs.

Common Mistakes

Writing critique prompts that say 'identify any issues' without referencing specific constitutional principles, leading to unfocused, inconsistent critiques that vary wildly between runs.

Correction

Always enumerate your constitutional principles explicitly in the critique prompt and require the model to evaluate the response against each one individually. Structure the critique output to mirror the principle list.

Running only a single critique-revision pass and assuming the output is clean, when revisions frequently introduce new subtle issues that weren't present in the original.

Correction

Run at least two iterations by default. The second critique pass on the revised output catches issues introduced during revision and ensures the fix didn't break something else.

Making constitutional principles too abstract (e.g., 'be ethical') so the model cannot operationalize them into specific critiques, resulting in vague and unhelpful evaluations.

Correction

Convert each principle into one or two concrete, evaluable questions. 'Be ethical' becomes 'Does this response encourage illegal activity?' and 'Does this response respect user autonomy and informed consent?'

Omitting the 'preserve helpful content' instruction in revision prompts, causing the model to default to overly cautious non-answers that technically violate nothing but also help no one.

Correction

Explicitly instruct the model to keep all accurate, helpful, and appropriate content intact. Frame the revision as surgical correction, not wholesale rewriting. Monitor the helpfulness of revised outputs alongside their safety.

Treating the self-critique loop as a fire-and-forget solution without monitoring or validation, missing systematic blind spots that the model consistently fails to catch.

Correction

Implement ongoing spot-check validation on a sample of outputs. Track which principles are most frequently violated and which violations are most frequently missed, then update your principles and prompts accordingly.

Frequently Asked Questions

How many critique-revision iterations should I run for best results?

Most implementations see strong results with 2-3 iterations. The first pass catches obvious issues, the second catches subtleties introduced during revision. Beyond 3 iterations, you typically see diminishing returns and risk the model over-correcting into blandness. Monitor your specific use case to find the sweet spot.

Can I use the same model for both generation and self-critique?

Yes—this is the standard approach in Constitutional AI. The same model that generates the response evaluates it. Research shows that models are often better at identifying problems in text than avoiding them during generation, making self-critique effective even without a separate evaluator model.

How does self-critique differ from using claude ai prompts seo system instructions for safety?

System instructions set upfront constraints but can't catch every edge case during generation. Self-critique operates post-generation, evaluating the actual output against principles. The combination is more robust: system instructions reduce the frequency of issues, and self-critique catches what slips through.

What if the self-critique loop introduces new errors during revision?

This is common and exactly why iterative passes matter. The second critique pass is specifically designed to catch issues introduced during revision. Additionally, include 'preserve accurate content' instructions in your revision prompt to minimize unnecessary changes that could introduce errors.

How does implementing self-critique connect to RLAIF training in Constitutional AI?

The critique-revision pairs generated by this process become training data for RLAIF. The model learns to prefer revised responses over initial ones, effectively internalizing the constitutional principles. See our guide on generating reinforcement learning from AI feedback for the full training pipeline.

Can I apply self-critique to existing AI content that wasn't generated through Constitutional AI?

Absolutely. Self-critique works on any text regardless of how it was generated. You can retroactively evaluate and revise existing AI-generated content, human-written content, or outputs from any model by feeding them through your critique-revision loop with your chosen constitutional principles.