About Zendesk QA prompt-based AI insights (EAP) – Second Brand

Add-on

Quality Assurance (QA) or Workforce Engagement Management (WEM)

Quick Look: Zendesk QA > Settings > AI

Prompt-based AI insights are currently in an early access program (EAP). You can sign up for the EAP here.

Zendesk QA prompt-based AI insights leverage the latest AI models, allowing you to customize AI-powered prompts using natural language for quality autoscoring and risk detection.

In addition to using or editing prompts from the AI insights prompt library, you can create your own custom prompt categories and spotlights.

By following these guidelines, evaluators can effectively leverage generative AI to assess customer support agent performance, ensuring clarity, consistency, and a strong focus on service quality.

This article contains the following topics:

Compliance suggestions for using Zendesk QA AI prompts
Writing prompts for AI insights
Scoring prompt-based AI insights

Compliance suggestions for using Zendesk QA AI prompts

Zendesk AI is built on our foundational principles of privacy, security, accuracy, transparency and customer control. See AI Trust at Zendesk.

Zendesk’s compliance and configuration suggestions are not legal advice. You, as the user, remain solely responsible for ensuring that your interactions with the system are fair, respectful, free from discriminatory or derogatory language, and appropriate for your purposes — including when using prompts from Zendesk’s prompts library.

We encourage you to maintain a polite tone in all communications, consider fair usage when creating prompts and implementing outputs, and always verify that the prompt is suitable for your specific use case.

Custom prompts and any other Zendesk QA AI prompts should not be used to make automated decisions, especially those related to employment or other high-risk situations as defined by the EU AI Act. Please be aware that Zendesk does not assume any responsibility for the consequences of misuse of the system.

Writing prompts for AI insights

We recommend keeping your prompts simple and focused on a single category and spotlight at a time. For example, avoid combining topics such as empathy and grammar in the same prompt. Instead, create separate prompts for each category. This approach helps the model evaluate each prompt more accurately, as it can be challenging to determine whether a rating applies to empathy, grammar, or both.

The goal of these prompts is to evaluate the performance of customer support agents based on service quality using generative AI. Therefore, ensure that responses can be generated without requiring validation from third-party applications or internal documentation, as these sources are not accessible to the AI model.

Write category and spotlight descriptions objectively, avoiding subjective language and phrasing. Subjective descriptions can result in inconsistent and non-measurable evaluations.

Below are examples of subjective expressions alongside their objective alternatives:

Subjective	Better
friendly	“demonstrated courtesy", "used polite language"
attentive	"responded to customer inquiries", "addressed customer needs"
helpful	"provided relevant information”, "resolved the issue presented"
professional	"maintained a formal tone"
confident	“provided clear explanations"
polite	"used polite language", "acknowledged the customer appropriately"

Evaluations should also be based solely on the conversation text. Ensure that you clearly define the rating criteria for each evaluation:

Use specific criteria. Focus on specific behaviors or actions taken by the agent rather than general feelings or impressions.

Instead of: Was the agent friendly?
Use: Did the agent use polite language, maintain a formal tone and acknowledge the customer appropriately?

Define expectations clearly. Outline what constitutes satisfactory performance for each criterion to minimize subjectivity.

Instead of: Did the agent communicate well?
Use: Did the agent use polite language, avoid derogatory words and slang? Evaluate the agent negatively, if they failed all three criteria. Rate positively if they avoided derogatory words and slang, but used polite language otherwise.

Use consistent terminology. Maintain uniform language throughout all rating descriptions.

Use the term "agent" instead of alternatives like "colleague," "employee," "representative," "advocate," or "associate." Similarly, use "customer" instead of terms like "member," "caller," "guest," or "subscriber."

Do not use acronyms and abbreviations.

Instead of: Did the agent confirm the customer’s DOB?
Use: Did the agent confirm the customer’s date of birth?

Avoid double quotes unless necessary. Use double quotes only when referencing exact words spoken by the agent or customer. This approach allows for a broader evaluation of intent or sentiment without restricting assessments to specific phrasing.

For example, instead of: "Did the agent say '"Have a nice day"?"
Use: "The agent wished the customer a nice day."

Also, try to:

Provide examples. Include examples of acceptable and unacceptable responses to guide evaluators in their assessments.

When questions require knowledge of specific business terminology, explicitly define those terms in the instructions. For example, if the agent must mention a department name in their greeting, provide a list of acceptable department names.

Avoid subjective, and vague intensifiers that are not objectively measurable.

Do not use terms like very, a bit, incredibly, totally, absolutely, or emphatic that cannot be measured objectively.

Prioritize clarity over stylistic variation. Don’t be afraid of repetitive and “dull” language, don’t try to be stylistically creative, e.g. use synonyms. Try to keep the terminology consistent and fixed.

Be clear about your rating conditions. Clearly specify whether all conditions must be met or if meeting some is sufficient for a given rating. This clarity improves consistency and reliability in scoring.

For example, in “Did the agent confirm the customer’s booking number and name?” Both the booking number and the name have to be confirmed by the agent to receive a positive rating. But in “ Did the agent confirm the customer’s booking number or name?”, only one of these needs to be confirmed to get a positive rating. To clarify, you can also be explicit about this: “Both have to be confirmed” or “only one (name or booking number) has to be confirmed.” The more explicit and clear you are in your prompt writing, the more consistent the output.

Write your rating criteria in affirmative language rather than negative. This positive framing can lead to clearer and more effective evaluations.

For example, use: “The agent used polite language”,”The agent avoided derogatory words.” instead of “The agent didn’t use derogatory words.”

Scoring prompt-based AI insights

After establishing your prompt, the next step is to define how evaluations are applied. This involves specifying what constitutes a positive or negative outcome and selecting clear terms or phrases to represent these outcomes. Examples include yes/no, helpful/unhelpful, or polite/impolite.

Assigning the correct outcomes based on your rating criteria is essential to ensure accurate evaluations.

Below are examples illustrating how to structure these evaluations:

Politeness of language:
- Question: Did the agent use polite language?
  - Positive outcome: Yes
  - Negative outcome: No
Use of derogatory words:
- Question: Did the agent use derogatory words?
  - Positive outcome: No
  - Negative outcome: Yes

By clearly defining these parameters, you ensure that evaluations are consistent, aligned with your established rating criteria, and accurately reflected in your AQS scores.