Experiments
title: Manage playbook experiments description: Run playbook A/B experiments end-to-end: plan a hypothesis, configure variants, preflight, ramp traffic safely, monitor health, decide a winner, and archive history.
Run a playbook experiment end to end
Use experiments to safely compare playbook variants on real traffic, then roll out the winner with confidence.
An experiment routes conversations between playbooks in a single workspace and tracks performance over time. You control:
- Which playbooks participate as variants
- How traffic splits across variants using weights
- When the experiment starts and ends
- Lifecycle actions such as Preflight, Activate, Pause, Complete, and Archive
Experiments work best when you change one thing at a time (for example, prompt wording) while keeping workflow, targeting, and metrics consistent across variants.
Main experiment workflow
Follow this workflow to run a safe playbook experiment from idea to archive.
Plan your experiment
Start by clarifying what you want to learn and how you will measure success.
- Choose a single workflow to test, such as support triage or account onboarding.
- Decide your primary metric, for example conversion rate, successful resolution rate, or handoff rate.
- Identify a control playbook and one or more variant playbooks you want to compare.
You will configure these choices on the Experiments page.
Create and configure the experiment
Open the Experiments page in the AI Config section and create a new experiment.
Fill in the key fields:
- Name – A clear, human-readable title like "Support triage prompt test".
- Workspace – The workspace whose conversations you want to route through this experiment.
- Description – A short hypothesis, for example "Shorter prompt should reduce handoff rate while keeping CSAT stable."
Then configure variants in the Variants table:
- Add at least two variants.
- For each row, set:
- Name – How you want to label the variant.
- Playbook – The playbook to run for that variant.
- Weight – A non-negative integer for its share of traffic.
- Control – Mark exactly one variant as the control.
Run preflight to validate configuration
From the experiment detail view, run Preflight.
Preflight checks:
- That you have 2 or more variants.
- That exactly 1 variant is marked as control.
- That all weights are non-negative integers.
- That variants are eligible to receive traffic in the selected workspace.
Preflight generates a report with errors (which block activation) and warnings (which do not block but may affect quality). Fix all errors before activation.
Start the experiment (activate or schedule)
When preflight passes, start routing traffic:
- Use Activate/Schedule from Draft to either:
- Activate now, or
- Schedule a future start time.
- After activation, status moves to Active, and the health snapshot starts collecting data such as exposures and conversion rate over the last 30 days.
Use the audit timeline to confirm that activation or scheduling was applied at the expected time.
Ramp traffic safely using weights
As the experiment runs, you can adjust weights in the Variants table to gradually shift traffic.
A common safe ramp pattern:
[
{
"name": "Control",
"playbookId": "pbk_support_triage_v1",
"weight": 90,
"control": true
},
{
"name": "Short prompt",
"playbookId": "pbk_support_triage_v2",
"weight": 10,
"control": false
}
]
After monitoring results, you might move to a more even split:
[
{
"name": "Control",
"playbookId": "pbk_support_triage_v1",
"weight": 50,
"control": true
},
{
"name": "Short prompt",
"playbookId": "pbk_support_triage_v2",
"weight": 50,
"control": false
}
]
Higher weights send more traffic to that variant. Keep weights as integers; the routing logic uses relative values to determine the final split.
Monitor health and behavior
While the experiment is Active, use the health snapshot and audit timeline in the detail view to monitor behavior.
- The health snapshot shows the last 30 days of experiment performance, including exposures, conversion rate, and overall health.
- The audit timeline records key actions, such as Preflight runs, weight changes, and lifecycle events like Activate, Pause, Stop, Complete, and Archive.
Watch for:
- Sudden drops in conversion or quality.
- Variants that appear under-exposed relative to their weights.
- Long-running experiments with stable results that may be ready for a decision.
Decide: pause, stop, or complete
When you have enough evidence to act, use lifecycle actions to close out or pause the experiment:
- Pause – Temporarily stop routing new traffic while keeping configuration. Use this if you need time to investigate results or incidents.
- Stop – End the experiment when conditions changed or the test is no longer valid, without declaring a winner.
- Complete – Mark the experiment as finished with a decided outcome once you have chosen a winning variant.
Each action updates the experiment status and adds an entry to the audit timeline so you can see who changed what and when.
Archive finished experiments
After you complete or stop the experiment and apply your learnings (for example, by updating your default playbook configuration), Archive the experiment.
Archiving:
- Keeps the experiment available as a historical record, including configuration, health snapshot, and timeline.
- Moves the status to Archived and hides it from the default list unless you toggle include archived.
Use archiving to keep the experiment list focused on current and upcoming tests.
Avoid ending experiments too early based on a small number of exposures or a single day's performance. Let the experiment run long enough to see stable patterns for your primary metrics.
Work with the experiments page
The Experiments page has two main areas: the list view and the detail view.
Filter and find experiments
Use filters at the top of the experiment list to stay focused:
- Workspace – Show only experiments in a specific workspace.
- Status – Filter by Draft, Active, Paused, Completed, or Archived.
- Include archived – Toggle whether archived experiments appear in the list.
Search and filters help you quickly find the experiment you want to inspect or edit.
Configure variants safely
The Variants table in the experiment detail view defines which playbooks participate and how traffic splits between them.
Add and edit variants
In the Variants table:
- Add a row for each playbook you want to test.
- Enter a clear Name for each variant.
- Select the associated Playbook.
- Set an integer Weight for the traffic share.
- Mark one variant as Control with the control checkbox.
You can edit names, playbooks, and weights while the experiment is in Draft or Paused status.
Respect validation rules
Before activation, variants must satisfy:
- At least two variants.
- Exactly one control variant.
- All weights are non-negative integers.
If any of these rules are violated, Preflight will fail, and you will not be able to activate the experiment.
Use descriptive variant names that capture the change, such as "Prompt v1 long intro" and "Prompt v2 concise intro". This makes audit timelines and reports easier to interpret later.
Key fields you will edit
These are the fields you will interact with most often while setting up and operating experiments.
Human-readable experiment name used in the list and detail views.
Short explanation of the hypothesis, primary metric, or rollout strategy.
Actual or scheduled start time. Set when you activate or schedule the experiment.
Planned or actual end time. Often set when you complete or stop the experiment.
Traffic allocation weight for a variant. Higher weights receive proportionally more traffic.
Marks the control variant. Exactly one variant must be control for Preflight and activation to succeed.
Run preflight and lifecycle actions
Use lifecycle actions from the experiment detail panel to validate and control traffic.
Validate with Preflight
Click Preflight to generate a validation report.
Preflight evaluates:
- Variant rules: at least two variants, exactly one control, valid weights.
- Eligibility: whether variants can receive traffic in the chosen workspace.
- Configuration issues that would prevent safe routing.
If Preflight shows errors, fix them and rerun until the report indicates the experiment is eligible to activate. You can also review warnings to understand potential quality risks.
Start with Activate or Schedule
When Preflight passes:
- Use Activate/Schedule for new experiments to:
- Activate immediately, or
- Schedule a future start time.
- Use Activate on paused experiments to resume traffic now.
- Use Schedule if you want the experiment to begin automatically later.
The audit timeline records activation and scheduling actions with timestamps.
Pause, stop, or complete
Control an ongoing experiment based on what you observe:
- Pause – Temporarily stop routing new traffic while preserving configuration. Use this during investigations or incident response.
- Stop – End the experiment if it is no longer valid (for example, traffic mix changed). This does not declare a winner.
- Complete – Finish the experiment and mark it as decided when you have chosen a winning variant and no longer want to split traffic.
After you complete or stop an experiment, you can archive it once you have applied the outcome.
Archive experiments
Use Archive to move finished experiments out of your active list.
- Archived experiments retain full detail: configuration, audit timeline, and health snapshot.
- To view them later, enable the include archived filter in the experiment list.
Archiving is reversible by updating the status if you need to revisit or duplicate an experiment setup.
Run Preflight after any significant change to variants, weights, or scheduling before you re-activate or schedule the experiment. This helps catch misconfigurations before they affect production traffic.
Troubleshooting experiments
Use these patterns when something does not look right.
Preflight keeps failing
If Preflight reports blocking errors:
- Check that you have at least two variants in the Variants table.
- Confirm that exactly one variant has the control checkbox selected.
- Verify that all weights are non-negative integers (no decimals, blanks, or negative values).
- Review any additional Preflight error messages for workspace or eligibility issues.
After making corrections, rerun Preflight until errors clear.
A variant is not receiving traffic
If a variant shows low or zero exposures despite a non-zero weight:
- Confirm the variant's weight is greater than 0 and saved.
- Make sure the variant's playbook is valid and active for the selected workspace.
- Check the health snapshot and audit timeline for recent changes to weights or status.
- Run Preflight again to detect any new configuration issues.
If necessary, temporarily increase the variant's weight to confirm routing behavior, then adjust back once you verify traffic is flowing.
Experiment is stuck in Draft or Paused
If the experiment does not move to Active:
- Ensure you have run Preflight and resolved all blocking errors.
- Use Activate/Schedule from Draft, or Activate from Paused.
- If you scheduled a start time, check that the time is in the future and in the correct timezone.
Look at the audit timeline to confirm whether an activation or scheduling action succeeded or if it failed due to validation issues.
Metrics look risky or unstable
If the health snapshot shows concerning trends:
- Pause the experiment to stop new traffic while you investigate.
- Review recent changes in the audit timeline, such as weight edits or playbook updates.
- Check conversations linked to the experiment's playbooks to understand what users are experiencing.
Once you understand the impact, decide whether to Stop the experiment, Complete it with a winner, or adjust weights and Activate again.
When in doubt, use Pause first. Pausing preserves configuration and history while giving you time to inspect health, metrics, and recent changes before you decide on next steps.
Related pages
Last updated today
Built with Documentation.AI