Job Description
Job Description
Senior Site Reliability Engineer – Compute Infrastructure
Location: Boston, MA (Hybrid – Tues–Fri Onsite | Mondays Remote)
Compensation: $134,250 – $214,800 + Bonus + Equity + Full Benefits
We are representing a cutting-edge technology company that is seeking a Senior Site Reliability Engineer (SRE) to join their global infrastructure team. In this role, you'll play a critical part in scaling and optimizing the organization's cloud-native Kubernetes platform—the backbone for internal engineering teams delivering high-impact applications and services.
This role is ideal for an SRE who thrives in complex distributed environments, is passionate about developer enablement, and enjoys building robust systems that balance performance, reliability, and scalability.
Why You Should Apply:
-
You'll work on global, mission-critical systems running on modern cloud infrastructure
-
High autonomy in a fast-paced, high-impact engineering environment
-
Opportunity to shape SRE best practices across the org
-
Hybrid work culture that values face-to-face collaboration and innovation
What You'll Do:
-
Architect and scale cloud-native Kubernetes infrastructure to support internal engineering workflows
-
Develop tools and platforms that empower product and infrastructure teams to deploy and manage services rapidly and securely
-
Write clean, efficient, and maintainable code in languages such as Python, Go, C#, or Java
-
Use Infrastructure as Code (IaC) tools like Terraform or Pulumi to provision and manage cloud resources
-
Enhance observability and alerting systems using APM, metrics, and log aggregation tools
-
Partner with developers to optimize CI/CD pipelines and ensure smooth software delivery lifecycles
-
Provide strong documentation to promote self-service and onboarding across engineering
-
Continually assess and improve platform reliability, operability, and cost-efficiency
-
Contribute to system design reviews and mentor junior engineers on cloud-native best practices
What You Bring:
-
7+ years of experience in Platform Engineering or Site Reliability Engineering
-
Proven experience managing Kubernetes platforms at scale (e.g., AKS, EKS, or GKE)
-
Strong programming experience in Python, Go, C#, Java, or similar languages
-
Deep understanding of cloud platforms like AWS or Azure
-
Experience with ArgoCD, GitHub Actions, or similar CI/CD tools
-
Proficiency with observability tooling (Datadog, Prometheus, Grafana, etc.)
-
Expertise in networking, security protocols, and container orchestration
-
Familiarity with communication protocols such as SPI, UART, RS485, and modern interfaces like TLS, X.509, etc.
-
Experience building testable, scalable IaC modules and managing multi-environment deployments
-
Strong collaboration and documentation habits in cross-functional teams
-
Empathy for internal users and a customer-focused mindset
Benefits:
-
Competitive base salary: $134,250 – $214,800 (based on experience & location)
-
Bonus + equity opportunities
-
Discretionary time off (DTO) policy
-
Paid parental leave for all caregivers
-
Medical, dental, and vision coverage
-
Fitness and wellness reimbursements
-
Mental health & professional development support
-
Hybrid workplace with in-office perks (snacks, events, and team-building activities)
Note: Compensation and benefits may vary depending on experience level and geographic market.