# Data centre exit — risk register

Twelve risks we have seen sink or stall mid-market DC-exit programs against a
hard landlord deadline. Each row: the **signal you can detect early**, the
**mitigation that worked**, and the **wave to act on it in**.

This is a starting register, not the complete one for your program — every
estate adds 5–10 of its own. Track these in your program tooling (Jira /
Asana / linear), not just in this file.

| # | Risk | Likelihood | Impact | Early detection signal | Mitigation | Act on by |
|---|------|-----------|--------|------------------------|------------|-----------|
| 1 | **Hidden dependency on an un-migrated workload** | High | High | Static analysis of network flows shows traffic from a "migrated" workload back to on-prem after cutover | Run a 7-day dependency-flow analysis before every wave; freeze any wave that has un-migrated dependencies outside its scope | Discovery, then re-confirm 2 weeks before each wave |
| 2 | **Replication lag undetected because both monitors live on the same side** | High | High | Replication lag dashboard reads from target only; source-side lag never alarms | Mandate a lag monitor on **both** source and target with a hard go/no-go gate of < 60s for 30 min before cutover | Wave 1 (set the pattern) |
| 3 | **Cutover team split across timezones with no handoff protocol** | Medium | High | On-call rota shows different teams on consecutive shifts during a cutover window | Single command-room model for waves with criticality ≥ medium; explicit handoff every 2 hours with a written status note in chat | Wave 2 onwards |
| 4 | **Decommission-of-source happens too early, blocking rollback** | Medium | Critical | Decom date in plan is < 30 days post-cutover for a critical-class workload | Decom dates ≥ 30 days for medium, ≥ 60 days for high, ≥ 90 days for critical. Decom is a *separate* go/no-go gate, not automatic | Wave plan v1 |
| 5 | **Identity/SSO cutover not staged before web tier** | High | Critical | Web tier wave scheduled before SSO wave | Re-order waves so identity is operational and tested 2 weeks before any customer-facing tier moves | Wave plan v1 |
| 6 | **AWS account / OU structure not ready for production workloads** | Medium | High | First production workload is scheduled into a Sandbox-class OU | Landing zone v1 must include Production OU, SCPs, network baseline, and central logging *before* Wave 1 cutover, not Wave 1 prep | Pre-Wave 1 |
| 7 | **Network DX/VPN capacity hits ceiling during dual-run** | Medium | High | Direct Connect utilisation > 70% during dual-run of any wave | Pre-size DX for 2.5× peak of the largest wave's dual-run replication + steady traffic; add a second virtual interface before Wave 3 | Pre-Wave 3 |
| 8 | **Database replication tool (DMS / qlik / native) hits silent throttle** | Medium | High | CDC lag rises slowly while CloudWatch CPU on DMS replication instance is < 30% — replication instance is undersized for the source change rate | Run a 24-hour load test of CDC at full source-write rate before cutover; right-size replication instance to handle 3× steady-state | 4 weeks before any stateful wave |
| 9 | **Trading-partner / B2B endpoint cutover causes EDI bounces** | Medium | High | Partner change-control returns < 50% confirmation 2 weeks before cutover | 6-week partner-notification lead time on EDI workloads; staged DNS so old endpoint redirects with a meaningful error for 30 days | 6 weeks before B2B wave |
| 10 | **Vendor-managed application (ERP / MES) without a tested AWS-supported pattern** | Medium | Critical | Vendor support matrix lists "AWS" but with no published reference configuration | Force vendor onto a signed reference architecture, in writing, before scheduling the wave; vendor on-site for the cutover | Pre-Wave 7 |
| 11 | **Team burnout in waves 4–6** (programme momentum dies) | High | High | Standup chat sentiment turns; PTO requests cluster after waves 3–4 | Built-in 2-week pause between Wave 5 and Wave 6; rotate cutover ICs every two waves; track rota hours like an oncall budget | After Wave 3 |
| 12 | **Landlord lease end date is not the hard constraint we thought** (early-termination penalty) | Low | High | Legal review finds a notice clause requiring N months written notice before contract end | Hand legal the lease and the program plan in month 1; surface notice clauses on the master timeline; treat them as gates | Month 1 |

---

## How to use this register

1. Copy it into your program tooling and **add owners and dates** to every row.
2. Each row gets a status: `monitoring` / `triggered` / `mitigated` / `closed`.
3. Surface the register at every fortnightly steering committee — not just risks
   that have triggered. The goal is to **see** approaching risks before they
   become incidents.
4. Add 5–10 risks specific to your estate. The ones above are the cross-program
   constants. Yours will have a specific replication tool, a specific vendor,
   a specific network topology.
5. **Risks 2, 4, 5, and 11 are the most common program-killers.** Read them
   first. Mitigate them before Wave 1, not after.
