LogoLogo
API DocsDeveloper PortalSystem StatusTry for Free
  • Quickstart Guide
    • Introduction
    • Get started as an Account Owner
    • Get started as a User
    • Glossary
    • FAQs
  • Manage Users
    • Types of Users
    • Add and Delete Users
    • Import Users
    • User Permissions - Access Controls
    • Manage Your Profile
    • Notification Rules
    • On-Call Reminder Rules
    • Change Account Owner
  • Manage Teams
    • Understanding Teams
    • Role Based Access Control
    • Owner Based Access Control
    • Create and Delete Teams
    • Add and Remove Team Members
    • Squads
    • Stakeholder Groups
  • Services
    • Adding a Service
    • Service Overview
    • Service Graph
    • Maintenance Mode
    • Alert Deduplication Rules
      • Alert Deduplication Rules
      • Incident Status Based Deduplication
      • Service Dependency Based Deduplication
      • Key Based Deduplication
    • Event Tagging
    • Alert Routing
    • Alert Suppression
    • Custom Content Templates
    • Intelligent Alert Grouping (IAG)
    • Auto Pause Transient Alerts (APTA)
    • Delayed Notifications
  • Schedules
    • Schedules (New)
      • Adding a Schedule
      • Schedules Overview
      • Who is On-Call?
      • My On-Call Shifts
      • Overrides
      • Videos: How to set up common use cases?
  • Escalation Policies
    • Create and Manage Escalation Policy
    • Round Robin & Advanced Escalations
    • Reassign an Incident
  • Notifications
    • Understanding Incident Notifications
  • Dashboards
    • Incident Management Dashboard
    • Dashboard Metrics
    • Take Bulk Actions
    • Squadcast Search
  • Incident List
    • Incident List View
    • Incident Priorities
    • Filter Incidents
    • Save Filter View
    • Merge Incidents
    • Snooze Incidents
  • Incidents Page
    • Incidents Details
    • Incident Activity Timeline
    • Communication Channels
    • Create Incident Manually
    • Incident Notes
    • Incident Watchers
    • Past Incidents
    • Additional Responders
    • Incident Summaries
    • Incident Suggestions
  • Runbooks
    • Runbooks
  • Postmortems
    • Postmortem Templates
    • Create Postmortems
    • Accessing Postmortem
  • Status Page
    • Status Page
    • Status Page Overview
    • Components and Groups
    • Issues
    • Subscribers
    • Maintenance
  • SLO Tracker
    • SLO Basics
    • Configure and Monitor your SLOs
  • Webforms
    • Webforms
  • Global Event Rulesets
    • Global Event Rulesets
  • Workflows
    • Workflows
    • Workflows Overview
    • Actions
  • Live Call Routing
    • Live Call Routing
  • Analytics
    • Analytics (New)
    • Organization Level Analytics
    • On Call Hours Per User
    • Weekly Reports
  • Integrations
    • Incident Webhook (Incident Webhook/API)
    • Outgoing Webhooks
    • ServiceNow Extension
    • Extensions
      • Jira Cloud Integration
      • Jira DC (Data Center)
      • CircleCI
      • Google Chat
      • Freshdesk
      • Freshservice
      • Asana
      • ClickUp
      • Trello
      • Zendesk
      • Hubspot
    • Alert Source Integrations (Native)
      • Admin Labs
      • Airbrake
      • Amazon EventBridge
      • Amazon GuardDuty
      • Amazon Opensearch
      • APImetrics
      • AppDynamics
      • AppSignal
      • Auvik
      • AWS CloudTrail Logs
      • AWS CloudTrail via CloudWatch
      • Amazon Cloudwatch (AWS) Integration
      • AWS CloudWatch Event Rules
      • AWS Elastic Beanstalk via CloudWatch
      • Amazon RDS (AWS)
      • Amazon SNS (AWS)
      • Azure Monitor
      • Better Uptime
      • Bitbucket
      • Bitrix 24
      • Blue Matador
      • Bugsnag
      • Buildkite
      • Checkly
      • Checkmk
      • CircleCI Integration
      • Cisco DNAC
      • Cisco Meraki
      • ClickUp Integration
      • CloudAMQP
      • Cloudflare
      • Conviva
      • CopperEgg
      • Coralogix
      • Cronitor
      • Crowdstrike Falcon
      • Datadog
      • Databricks
      • Dead Man's Snitch
      • Domotz
      • Dotcom Monitor
      • Dynatrace
      • ElastAlert
      • Elastic
      • Elecard Boro
      • Email Integration
      • Endtest
      • Errorception
      • Freshdesk Integration
      • Freshping
      • Freshservice
      • Ghost Inspector
      • GitHub Integration
      • GitLab
      • Grafana 8
      • Grafana
      • Graylog v4
      • Graylog
      • HaloPSA
      • Healthchecks
      • Heroku
      • HetrixTools
      • Honeybadger
      • Honeycomb
      • Humio
      • Hund
      • Hydrozen
      • Hyperping
      • Icinga2
      • InsightOps (LogEntries)
      • Instana
      • Intercom
      • Jenkins Integration
      • Jira Cloud Alert Source
      • Jira Server Alert Source
      • Kapacitor
      • Kentik
      • Komodor
      • Kibana
      • LibreNMS
      • Linear
      • Loggly
      • Logstash
      • Logz.io
      • ManageEngine Application Manager
      • ManageEngine Opmanager
      • Mezmo (formerly LogDNA)
      • MongoDB Atlas / Cloud Manager
      • Nagios
      • New Relic
      • Nixstats
      • NodePing
      • Observium
      • Oh Dear
      • Oracle Cloud Infrastructure
      • OSNexus QuantaStor
      • OverOps
      • Papertrail
      • Pingdom
      • Plesk 360
      • Postman
      • Postmark
      • Powercode
      • Progress WhatsUp Gold
      • Prometheus
      • PRTG Network Monitor
      • Rapid7 InsightIDR
      • RapidSpike
      • Redash
      • Redgate SQL Monitor
      • Rollbar
      • Rundeck
      • Runscope
      • Salesforce Cloud
      • Scout APM
      • Sematext
      • Sensu Go
      • Sensu
      • Sentry.io
      • Server Density
      • ServerGuard24
      • ServiceNow Integration
      • Shortcut (Clubhouse)
      • SignalFx
      • SigNoz
      • Site24x7
      • Slack
      • SolarWinds AppOptics
      • SolarWinds Observability SaaS (SWO)
      • SolarWinds Observability Self Hosted
      • Sonar
      • Splunk
      • Sqreen
      • Stackdriver
      • Stackify Retrace
      • StatHat
      • StatusCake
      • ServiceDesk Plus OD
      • Sumo Logic
      • Sysdig Monitor
      • Threat Stack
      • Trello
      • Twilio
      • Uptime
      • Uptime Robot
      • Uptrends
      • Wavefront
      • Zabbix 5.0
      • Zabbix 6.2
      • Zabbix
      • Zendesk Integration
      • Zoho Desk
      • Zoho Desk via Zoho Flow
      • LogicMonitor
  • ChatOps
    • Google Chat
    • Microsoft Teams
    • Slack for Incident Management
      • Using the Integration
  • Single Sign-On (SSO)
    • AWS SSO
    • Azure Active Directory SSO
    • Google SSO
    • Microsoft ADFS SSO
    • Okta SSO Integration
    • SAML 2.0 based SSO
  • Mobile App
    • Using the Mobile App
  • Terraform & API Documentation
    • Terraform Provider
    • Public API - Refresh Token
    • API Documentation
    • Getting Started with Squadcast GraphQL
      • Schedules
        • Create Schedule
        • Update Schedule
        • Delete Schedule
        • Pause Schedule
        • Get Schedules
        • Get Schedule by ID
        • Resume Schedule
        • Clone Schedule
        • Get Gaps
      • Rotations
        • Create Rotation
        • Update Rotation
        • Delete Rotation
        • Get Rotation by ID
        • Get Rotation Events by ID
      • Overrides
        • Create Override
        • Update Override
        • Delete Override
        • Get Override by ID
      • Calendar URLs
      • Who is On-Call
    • Developer Portal
    • Incident Rate Limiting
  • Managing your Squadcast Account
    • Audit Logs
    • Organizations
    • Billing FAQs
    • Deactivate your Squadcast Account
    • Delete your Squadcast Account
Powered by GitBook
On this page
  • Creating New SLO
  • Define SLO
  • Configure SLO
  • Error Budget Policy
  • Monitor Your SLOs
  • SLO Detail Page
  • Marking an Incident as False Positive
  • Deleting an SLO
  • FAQs

Was this helpful?

  1. SLO Tracker

Configure and Monitor your SLOs

With Squadcast, you can define and monitor Service Level Objects for your services.

PreviousSLO BasicsNextWebforms

Last updated 1 year ago

Was this helpful?

Note:

This feature is available as part of the Product Trial and .

Before configuring your SLOs, we recommend you read our .

Before you begin to define your SLO, you should have an expectation of what percent of your SLI (availability, latency, etc.) is needed to pass the SLO. For example, you may want your service to be available 99.99% of the time to pass the SLO. In this case, 99.99% is the Target SLO and “availability” is your SLI.

Creating New SLO

Navigate to SLO from the left sidebar. To create a new SLO, click on +Create New SLO button on the top right.

Define SLO

You begin by defining your SLO.

  1. Enter the SLO Name of your choice (this needs to be unique across SLOs)

  2. Enter the SLO Description detailing out specifics of the SLO

  3. Enter Tags (key-value pairs) specifying information such as the Owner, Environment and Type of SLO

You can additionally add your own tags by clicking the +Add Tag button.

Once done, click on Next to Configure SLO.

Configure SLO

  1. Under the Services Associated with this SLO tab, you can select multiple Services to link it to the SLO. Only incidents from these linked Services can then be mapped to the SLO

  2. Enter the SLIs that affect the SLO. There could be one or more SLIs - like availability, response time, etc - that map to this SLO

  3. Enter the percentage or ratio under Target SLO in %. This sets the target percentage to define compliance

  4. Error Budget is auto-calculated based on the values entered for target SLO and duration. It is calculated in minutes and cannot be edited

  5. Enter the Duration for this SLO, by choosing between Rolling Period or Fixed Duration (Calendar Duration)

  • Under Rolling, you can select the period in days. This option is used when you want the SLO calculated continuously for a defined number of days (for example, continuously over a 7-day period). This can be a maximum of 90 days.

  • Under Fixed Duration, you can select the start and end dates from the drop-down. This option is used when the SLO has to be calculated over a fixed duration - for example, over one quarter at a time. The fixed duration can be a maximum of one year. Once done, click on Next to configure Error Budget Policy.

Important: When an SLO reaches the end of its configured time duration, it transitions into an inactive state. Once inactive, the SLO becomes uneditable, signaling that it has fulfilled its purpose and achieved the intended goal for which it was established.

Error Budget Policy

The Error Budget Policy defines the conditions based on which to notify one or more Users or Squads or when incidents have to be created when a condition is breached.

Choose the conditions you want to be alerted for, out of the following options:

  • Alert when there is a breach of allocated Error Budget

  • Alert when there is an Unhealthy Burn Rate. An unhealthy burn rate is determined when the error budget is burning faster than what’s expected. For example, for an SLO of 99.99% over the course of a year, the error budget works out to be about 52 mins 35 seconds - or approximately about 4 mins 30 sec per month. If the error budget is being burnt faster than than, then its considered unhealthy

  • Alert when the number of False Positives exceeds the set limit

  • Alert when the Error Budget decreases below the set limit

Choose the mode of delivery of the alerts

  • Email Alerts, wherein an email notification is sent to the Users or Squads you specify

  • Incident Alerts, wherein an alert is created for the specified Service

Once done, click on Create, and your SLO is created!

Monitor Your SLOs

Once created, you can access all your SLOs for the current Team from the SLO dashboard.

The SLO list view shows information for each SLO, including:

Field
Description

Target SLO (%)

Shows the percentage or ratio to target, which is the target value for compliance

Current SLO (%)

Shows the current historical compliance with the SLO

SLO Health

Indicates the health of the SLO, either as Healthy or Needs Attention

Service

Shows the services related to the SLO

Status

Indicates whether the SLO is Active or Inactive

Incidents Reported

Indicates the # of incidents reported + the false positives for this SLO

Time Window

Indicates the type of time window you have configured, either Rolling or Fixed

Duration (Days)

Indicates the duration (in days) for which the slo is configured

Updated On

Indicates the latest date of update

Tags

Indicates the tags associated with the SLO

SLO Detail Page

To view the details of a particular SLO, select the SLO from the SLO list in the SLO dashboard.

The SLO details view shows information for each SLO, including:

Field
Description

Target SLO (%)

Indicates the percentage value set to target performance compliance.

Total Error Budget (mins)

Indicates the entire time period for which a system can fail without violating the SLO.

Time Window

Indicates the extent of time for which the SLO has been set. It can be on a rolling (or continuous) basis or on a fixed basis (eg, once a quarter).

Duration

Indicates the duration (in days) for which the SLO is configured.

The fields hereon are time-range sensitive.

Field
Description

Current SLO (%)

Indicates the current historical compliance with the SLO in the specified time range.

Error Budget Consumed

Indicates the Error Budget consumed in the specified time range.

MTTA (mins)

Indicates the mean time taken to acknowledge the SLO-violating incidents, for the specified time range.

MTTR (mins)

Indicates the mean time taken to resolve the SLO-violating incidents, for the specified time range.

Error Budget Consumed by SLIs

Indicates the consumption of error budget across different SLIs, including the number of incidents affecting each of the SLIs.

Marking an Incident as False Positive

This is useful if an incident was previously marked as one that affects an SLO and has been subsequently determined that it does not. This acts like an “undo” button.

Through the SLO Details page, checkmark the incident(s) -> Click on Mark as False Positive button

Note:

If you marked an incident as False Positive by mistake, you can undo this. Navigate to False Positives tab on SLO Details Page -> Check the incident -> Click on SLO Affected

Deleting an SLO

To delete an SLO, click on the icon on the right of the SLO from the SLO list, and click on the Delete icon, as shown in the image below.

FAQs

Please refer to the Frequently Asked Questions below that might help you fix any issues/answer your queries.

1. Can I delete services associated with an SLO?

Yes, you can delete a service associated with an SLO, the SLO and its Incidents will still be intact.

2. How is the Error Budget calculated?

An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget. If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period.

3. How is the Burn Rate calculated?

  • First, we calculate the error budget allocated for a day. (for eg: if slo is 99.99% for a year, you get a 55.5min/365 error budget for a day)

  • Then, we fetch the total error budget spent till today’s date

  • Subsequently, we see how many days its been since the slo started and later check if the user has consumed more error budget than they were supposed to, to calculate the burn rate

4. What determines a Healthy or Unhealthy SLO?

Healthy or unhealthy is based on how rapidly the error budget is getting depleted. Slo burn rate indicates how fast your error budget is getting consumed relative to your SLO’s target length.

For example, if have a 99.9% target for a month, then you will get 43.12 min of downtime, Which means you can burn 1.43 Min(43.12/30) of error budget every day. If you burn a total of 30 min of error budget within the first 10 days of your SLO duration then it will turn to Needs Attention (unhealthy).

So the SLO is Healthy today because it’s consumed less error budget than allocated for the days it's been since the SLO started.

5. Can I automatically associate incidents with an SLO?

We’re working on something that can help you do this in the near future.

6. Are there any rate limits for SLOs?

There exists a rate limit on the number of SLO promotions that can take place (manually or via Workflows). Not more than 50 incidents per hour can be promoted to SLO violations in a Team.

7. How many incidents are considered for error budget calculation when they are overlapping in a given time window?

The current incident and 49 preceding incidents (a total of 50 incidents) are considered for calculating the error budget consumed in case there is an overlapping observed by the system (i.e., calculate the cumulative error budget consumed for multiple "open" SLO-violating incidents).

Expand the accordion to view a further breakdown of the error budget consumed by each SLI, and a list of services associated with the SLO.

Have any questions? .

Ask the community
Premium and Enterprise Plan
SLO Basics documentation
Creating new SLO - Define SLO
Error Budget Policy Configuration
SLO dashboard
SLO Detail Page - Squadcast
False positive in SLO monitoring
Illustration of deleting an SLO