AI summary: Senior SRE designs unified observability platforms, manages SLIs/SLOs, implements AI-driven anomaly detection, and troubleshoots complex distributed financial systems.
Role: Sr. Site Reliability Engineer (SRE) – Unified Observability & AIOps
Location: Austin, TX / Fort Mill, SC (Hybrid)
Job Type: Full Time
Role Summary
We are seeking a Senior SRE with strong expertise in Unified Observability, proactive detection, AIOps, and GenAI-driven operations to support complex, distributed financial services platforms. The role requires hands-on experience designing SLI/SLO-driven monitoring, dynamic thresholds, intelligent alerting, and AI/ML-based anomaly detection across multi-stream architectures.
Key Responsibilities
Observability & Reliability Engineering
Proactive Detection & AIOps
Distributed Systems & Dependency Analysis
Tooling & Platforms
GenAI & LLM Enablement
Required Skills & Experience
✅ 15+ years in SRE / Production Engineering
✅ Strong Unified Observability background (not infra-only)
✅ Hands-on Dynatrace experience (metrics, traces, logs, Davis AI)
✅ SLI/SLO engineering experience in production systems
✅ Experience implementing dynamic thresholds and anomaly detection
✅ Knowledge of AI/ML concepts applied to Ops (AIOps)
✅ Distributed systems troubleshooting expertise
✅ Experience with Kafka or streaming data platforms
Differentiators (Highly Valued)