AI ConfigAI Safety
AI Config

Configure AI safety

Use the AI Safety page to set blocked and serious mode keywords, test changes safely, and tune policies that keep AI responses aligned with your standards.

AI safety overview

AI safety settings let you decide what your AI should never say and which phrases should trigger more careful, serious handling of a conversation.

System-level safety always runs first. Your own keyword lists sit on top of that baseline so you can enforce business-specific policies, legal constraints, and tone guidelines.

System safety and your custom safety rules both apply. When they disagree, the stricter rule wins. If the system would block or escalate a response, your configuration cannot weaken that decision.

What you will edit on the AI Safety page

The AI Safety page exposes three tenant-editable lists. Each list is a textarea where you enter one keyword or phrase per line.

Behind the scenes, each list is merged with a read-only system list. The merged effective list is what the platform enforces.

You do not need to duplicate obvious harmful or illegal terms. The system lists already cover a wide range of baseline safety and compliance topics.

Lists you control

blockedKeywordstextarea

Blocked keywords define phrases your AI should never say. When an AI response would contain one of these phrases (or a system-blocked phrase), the system blocks that response and falls back to your routing or default behavior instead.

seriousHighConfidenceKeywordstextarea

Serious mode keywords (high confidence) define phrases that always trigger serious handling when they appear in an AI response. Use these for clearly risky or high-impact topics.

seriousMediumConfidenceKeywordstextarea

Serious mode keywords (medium confidence) define softer signals of risk or sensitivity. These go through an AI verification step before the system switches to serious handling.

Blocked and serious mode keywords are checked against candidate AI responses, not customer messages. To react to what customers say (for example, messages containing "cancel my account"), use routing rules or playbooks that trigger on customer text.

Design an initial AI safety policy

Start by deciding what you want the AI never to say and what should be treated as serious in your environment.

Identify your high-risk topics

Work with legal, compliance, and operations to list topics that are unacceptable or require careful handling.

  • Legal and regulatory: investigations, fines, lawsuits, admissions of liability
  • Financial risk: refunds over policy limits, investment or tax advice, guarantees
  • Safety and wellbeing: threats, self-harm, harassment, discrimination
  • Brand and tone: profanity, hate speech, or phrases your brand will not use

Group these topics into:

  • Never say (block entirely)
  • Serious but allowed (respond carefully, often escalate)
  • Soft signals of risk (watch and sometimes escalate)

Decide what belongs in blocked vs serious mode

Map each topic group to one of the three lists:

  • Put phrases you never want the AI to output into blocked keywords
  • Put phrases that always require careful handling into serious mode high confidence
  • Put softer, early-warning phrases into serious mode medium confidence

Use blocked keywords sparingly for clear violations. Prefer serious mode for anything that still needs a response (for example, "harassment complaint" might be serious mode, while "threatening violence" might be blocked).

Draft example phrases

Translate topics into realistic phrases a customer or AI might use.

Good blocked keyword candidates:

admit legal liability
guarantee investment returns
share full credit card number
threaten physical harm
racial slur

Good serious mode (high confidence) candidates:

lawsuit
harassment complaint
regulatory violation
fraud investigation
data breach

Good serious mode (medium confidence) candidates:

very upset
thinking about leaving
feeling unsafe
really disappointed
need to complain

These examples are formatted as one phrase per line, matching how you should enter them in the UI.

Avoid overly broad or generic keywords such as "account", "problem", or "refund". These can match most conversations and either over-block responses or force serious mode on nearly every interaction.

Add and update keywords safely

Once you have an initial policy, use the AI Safety page to enter keywords in a controlled way rather than adding everything at once.

Open AI Safety and locate the three lists

From the dashboard, go to AI Config → AI Safety.

Locate the three editable areas:

  • Blocked keywords textarea
  • Serious mode keywords – high confidence textarea
  • Serious mode keywords – medium confidence textarea

You may also see system lists and effective lists that show what is already enforced. System entries are read-only and automatically included.

Enter phrases one per line

Paste or type phrases into each textarea using one keyword or phrase per line. Keep entries as short, distinct phrases, not full paragraphs.

Safer choices:

admit legal liability
share full credit card number
guarantee investment returns

Risky choices (too broad or noisy):

account
problem
help me
refund
email

Click Save to apply your changes. The system merges your entries with the system list to form the effective list that is actually enforced.

Scope changes if possible

If your environment allows configuring different assistants or channels separately, apply new keywords:

  • First to a test assistant or internal channel
  • Then to a subset of production traffic
  • Finally roll out to all assistants and channels

Coordinate with owners of routing rules and playbooks so they understand which phrases might now trigger different handling.

Start with a small, targeted set of blocked and serious mode keywords. Capture the obvious, high-risk phrases first, then expand based on real conversations and missed cases.

Test how your AI safety rules behave

Before broad rollout, confirm that your keywords behave as expected in realistic conversations.

Use a playground or test assistant

Open a testing environment where you can safely send messages to your assistant:

  • A staging or internal assistant
  • A live playground in the dashboard, if available
  • A test channel (for example, an internal Slack or email address)

Make sure this assistant uses the same AI Safety configuration you just edited.

Trigger blocked keyword behavior

Craft AI prompts that would cause the AI response (not the customer message) to include one of your blocked phrases.

For example, if you blocked share full credit card number, ask the assistant a question that might lead it to respond with that phrase. On success:

  • The AI's response that contains the blocked phrase is not sent to the customer
  • You see your fallback behavior (such as a generic safe response or escalation)

If responses still appear despite obviously blocked phrases, double-check that the phrase you entered matches the output text closely enough.

Trigger serious mode behavior

Send prompts that should cause the AI response to include one of your serious mode phrases.

For high confidence phrases, you should see immediate changes like:

  • More careful language or disclaimers
  • A different template or tone
  • Routing or escalation triggered through playbooks or routing rules

For medium confidence phrases, expect serious mode to trigger less often because an AI verification step checks whether the risk is truly present.

Review conversation transcripts

Use conversation detail views to inspect:

  • Which messages were blocked and why
  • When serious mode activated
  • How routing and playbooks responded

Confirm that high-risk conversations are handled more carefully, while normal conversations still flow smoothly.

If you rely on routing or playbooks for escalation, verify they are wired to respond to serious mode events or safety signals, not just keywords in customer messages.

Monitor and tune over time

Treat AI safety as an ongoing practice, not a one-time configuration.

Monitor blocked responses and serious mode volume

Use analytics and conversation reports to track:

  • How often responses are blocked by safety rules
  • How often serious mode triggers
  • Which keywords show up most frequently in safety events

A sudden spike in blocked responses or serious mode events may indicate that a newly added keyword is too broad.

Refine or remove noisy keywords

Review conversations where safety triggered and ask:

  • Did the keyword correctly identify a risky situation?
  • Did blocking or serious mode improve the outcome?
  • Is the keyword matching too many safe conversations?

Based on this review, you might:

  • Move a phrase from blocked to serious mode high confidence
  • Move a phrase from high to medium confidence
  • Narrow a phrase to be more specific
  • Remove a phrase that adds noise without improving safety

Add new phrases from real conversations

When you encounter risky conversations that were not caught by safety rules:

  • Highlight the phrase that indicates risk
  • Decide whether it belongs in blocked or serious mode
  • Add it as a new line to the appropriate list

Over time, your lists will reflect your real-world risk signals instead of only theoretical ones.

Revisit strategy with stakeholders

Regularly meet with stakeholders (support, legal, compliance, product) to:

  • Share metrics on safety events and escalations
  • Decide which areas need stronger or lighter controls
  • Align safety configuration with evolving policies and regulations

Update the AI Safety page after these discussions and repeat the testing steps.

If you significantly expand blocked or serious mode keywords, expect changes in routing and agent workload. More blocked responses and serious events can increase handoffs to humans and may affect response times.

Troubleshooting common issues

Use these checks when behavior does not match your expectations.