Delib | Developer Docs
Knowledge Base
  • Delib
    • 👋Welcome
    • Our Products
  • Security Centre
    • Information Security
      • Service Level Agreement (SLA)
      • Disaster Recovery
      • Certifications
      • Information Security Downloadable Documents
    • Privacy and GDPR
      • How do Delib's products comply with the GDPR?
      • Delib Sub Processors
  • Citizen Space
    • Getting Started With Citizen Space
    • Deployment Requirements
      • How to set up a CNAME record for a custom domain name
      • Department structures
      • Users and permissions
      • Theming requirements
    • Security Configurations
      • Single Sign On (SSO)
        • Citizen Space Single Sign-on (SSO) - Linking Accounts
      • Two factor authentication (2FA)
      • Configurable password policy
      • Log in back-off
      • Security Notifications
      • Password reset date on export
      • Session Length
    • Integrations
      • How to integrate Citizen Space into existing website
    • Data API
      • API specification
      • Generating API keys
      • Basic Auth headers with Citizen Space
    • Public API
      • Public API v2.x guide
      • Version 2.4 reference
    • Webhooks
      • Creating and managing webhooks
  • Geospatial
    • Set-up process for existing customers
      • How to find your Ordnance Survey API key
      • Linking your Ordnance Survey account to Citizen Space
  • Integrations and Playbooks
    • Google Looker Studio
      • Google Looker Studio
      • Google Looker Studio Simple Activity Counts
    • Microsoft
      • PowerBI Dashboards
Powered by GitBook
On this page
  • 1. Detection & definition
  • 2. On-call team alerted
  • 3. Initial investigation and assessment
  • 4. Customer notification
  • 5. Resolution
  • 6. Reporting, documentation and tidying up
  • 7. Review and retrospective
  • Other information
  1. Security Centre
  2. Information Security

Disaster Recovery

Below is an outline of the Disaster Recovery Process followed by Delib. This process is in place 24/7, 365 days of the year.

Here's a summary of the stages, and the target timescales. Each stage is detailed later in this article.

Stage

When?

1. Detection & definition

When notified by our monitoring systems or a customer

2. On-call team alerted

As soon as possible after detection of critical issue

3. Initial investigation & assessment

Within 1.5 hours of detection

4. Customer notification

5. Resolution

6. Report & document

Within 1 working day of resolution

7. Review & retrospective

Within 3 working days of resolution

1. Detection & definition

Detection: we'll be made aware of a critical issue by one of the following methods:

  • (a) Automated alert from our monitoring systems

  • (b) Internal detection (e.g. from investigations arising from third-party security announcements)

  • (c) Customer or end-user report

Definition: of a critical issue:

  1. Has a customer site been unavailable to the general public for more than 10 minutes?

  2. Is there a reproducible issue which prevents a user from entering or submitting data?

  3. Is there a reproducible issue which causes unavoidable or unexpected data loss?

  4. Is there a bug or security vulnerability that constitutes a realistic threat to privacy?

2. On-call team alerted

  1. If a critical error has been picked up by one of our monitoring systems, the team will be alerted by email and text message. Unavailability lasting ten minutes or longer is automatically reported.

  2. The on-call team will include at least one technical team member.

When? As soon as possible after detection of critical issue

3. Initial investigation and assessment

The technical lead aims to establish the cause of the issues, and assess the severity and likely duration of the service interruption.

Ideally, this will include:

  1. Identification of the root cause

  2. An assessment of the severity and scale of the problem, including which customers are affected

  3. An estimated time to resolution

When? Within 1.5 hours of detection

4. Customer notification

Affected customer(s) will be contacted to inform them of the service interruption, and that Delib are actively investigating the problem.

This communication will most likely be by email, but depends on the severity of the incident. Any wider-reaching issues may be posted as a homepage announcement on delib.zendesk.com in the first instance, ahead of any direct communication.

5. Resolution

Once the technical on-call lead has assessed the problem, they will report back to the on-call customer success manager as follows:

  1. If the problem can be easily solved, it will be fixed. The technical lead will report back to the customer success manager, and document the problem and solution

    or

  2. If the issue is more complex, a resolution plan is put in place to address the service outage. This may require more technical team members to be contacted, or for the investigation to be continued in office hours. An interim report, summarising expected cause, and steps to resolution will be provided to the customer success manager.

In both cases, any information we have will be communicated to affected customers by the on-call customer success manager. The customer success manager will continue to keep all affected customers updated with progress until we reach a resolution to the issue.

6. Reporting, documentation and tidying up

Once the problem has been resolved, the customer success manager will provide a written account for affected customers and Delib reference. This will include:

  • How the problem was detected

  • The scope of the problem and how it may have affected end-user interaction

  • The root cause

  • Steps to resolution, including any measures put in place to mitigate the risk of repeat occurrence

  • Total downtime

When? All of this should happen within 1 working day of the resolution of the issue.

7. Review and retrospective

Once the error has been resolved, Delib will have a retrospective to identify any long term counter-measures which can be put in place to prevent a recurrence of the issue.

This disaster recovery process is also reviewed to identify any improvements that can be made.

When? Within 3 working days of resolution

Other information

Would we ever take sites offline?

This will be the informed decision of Delib's Managing Director, who will be given a full brief of the situation by the on-call team. We will ask ourselves some specific questions to determine whether this may be necessary:

  • If the site stays online, could users submit data that gets lost without them knowing?

  • If the site stays online, could any existing data loss or corruption be made worse?

  • If the site stays online, is there a possibility of the loss or exposure of any personal information?

  • Conversely, if the site is taken offline, could any existing data loss be exacerbated?

This is a last resort for us, and we would never take sites offline unless leaving them online would pose more of a risk to the customer(s) or their respondents.

Policy reviewed February 2024

PreviousService Level Agreement (SLA)NextCertifications

Last updated 1 year ago

Within 2 hours of detection,

Depends on complexity of problem. But our target resolution times for either product or infrastructure issues are

When? Our gives a maximum initial response time for critical errors of 2 hours.

When? This depends on the complexity of the problem, but our target resolution times are .

Any service credit or other compensation offered by Delib, should the error have caused us to miss our

Service Level Agreement
set out in our Service Level Agreement
Service Level Agreement targets
as specified in our Service Level Agreement
detailed in our Service Level Agreement