Sr. SRE, Compute Infrastructure

NxT Level

Boston, MA, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Senior Site Reliability Engineer – Compute Infrastructure
Location: Boston, MA (Hybrid – Tues–Fri Onsite | Mondays Remote)
Compensation: $134,250 – $214,800 + Bonus + Equity + Full Benefits

We are representing a cutting-edge technology company that is seeking a Senior Site Reliability Engineer (SRE) to join their global infrastructure team. In this role, you'll play a critical part in scaling and optimizing the organization's cloud-native Kubernetes platform—the backbone for internal engineering teams delivering high-impact applications and services.

This role is ideal for an SRE who thrives in complex distributed environments, is passionate about developer enablement, and enjoys building robust systems that balance performance, reliability, and scalability.

Why You Should Apply:

You'll work on global, mission-critical systems running on modern cloud infrastructure
High autonomy in a fast-paced, high-impact engineering environment
Opportunity to shape SRE best practices across the org
Hybrid work culture that values face-to-face collaboration and innovation

What You'll Do:

Architect and scale cloud-native Kubernetes infrastructure to support internal engineering workflows
Develop tools and platforms that empower product and infrastructure teams to deploy and manage services rapidly and securely
Write clean, efficient, and maintainable code in languages such as Python, Go, C#, or Java
Use Infrastructure as Code (IaC) tools like Terraform or Pulumi to provision and manage cloud resources
Enhance observability and alerting systems using APM, metrics, and log aggregation tools
Partner with developers to optimize CI/CD pipelines and ensure smooth software delivery lifecycles
Provide strong documentation to promote self-service and onboarding across engineering
Continually assess and improve platform reliability, operability, and cost-efficiency
Contribute to system design reviews and mentor junior engineers on cloud-native best practices

What You Bring:

7+ years of experience in Platform Engineering or Site Reliability Engineering
Proven experience managing Kubernetes platforms at scale (e.g., AKS, EKS, or GKE)
Strong programming experience in Python, Go, C#, Java, or similar languages
Deep understanding of cloud platforms like AWS or Azure
Experience with ArgoCD, GitHub Actions, or similar CI/CD tools
Proficiency with observability tooling (Datadog, Prometheus, Grafana, etc.)
Expertise in networking, security protocols, and container orchestration
Familiarity with communication protocols such as SPI, UART, RS485, and modern interfaces like TLS, X.509, etc.
Experience building testable, scalable IaC modules and managing multi-environment deployments
Strong collaboration and documentation habits in cross-functional teams
Empathy for internal users and a customer-focused mindset

Benefits:

Competitive base salary: $134,250 – $214,800 (based on experience & location)
Bonus + equity opportunities
Discretionary time off (DTO) policy
Paid parental leave for all caregivers
Medical, dental, and vision coverage
Fitness and wellness reimbursements
Mental health & professional development support
Hybrid workplace with in-office perks (snacks, events, and team-building activities)

Note: Compensation and benefits may vary depending on experience level and geographic market.