Observability Developer

Observability Developer

R5460

Location

Toronto

Career Track

Technology

Observability Developer

This role is eligible for our hybrid work model: Two days in-office.

This job posting is for an existing, currently vacant position.

Observability Developer (SRE / O11y Platform)

Our Technology team is the backbone of our company: constantly creating, testing, learning and iterating to better meet the needs of our customers. If you thrive in a fast-paced, ideas-led environment, you’re in the right place.

Why this job’s a big deal:

As Priceline continues to scale globally, reliable production visibility is critical to delivering seamless customer experiences. We are investing in strengthening our observability foundations to improve detection, diagnosis, and overall system reliability.

This role plays a key part in maturing our observability capabilities—standardizing instrumentation, improving telemetry quality, and enabling faster root cause analysis that directly impacts MTTR and MTTD.

In this role you will get to:

  • Support and evolve end-to-end observability solutions for collecting, shipping, storing, and querying OpenTelemetry signals (metrics, logs, and traces) across infrastructure, containers, and Kubernetes environments.
    Administer and operate core observability platforms (Splunk, New Relic, ClickHouse, Grafana, Lightrun), including service onboarding, access management, configuration, upgrades, and ongoing platform health.

  • Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem that supports multiple telemetry types at scale.

  • Improve and standardize instrumentation practices across services, driving consistent logging, metrics, and distributed tracing implementation.

  • Partner with product and platform engineering teams to enhance production visibility and support SLO-driven reliability practices.

  • Optimize telemetry pipelines for performance, data quality, scalability, and cost efficiency.

  • Help define and support governance standards for observability, ensuring consistency, reliability, and scalability across teams.

  • Contribute to evolving our observability platform toward intelligent and AI-enabled capabilities, exploring opportunities to integrate AI or MCP-based solutions to improve signal quality, incident triage, and operational efficiency.

  • Ensure observability platform reliability, security, and performance meet defined SLAs and operational standards.

Who You Are:

  • Bachelor’s degree in Computer Science or equivalent practical experience.

  • 3+ years of experience in Observability, SRE, DevOps, or platform engineering roles supporting production systems.

  • Strong understanding of APM and SRE fundamentals, including MELT (Metrics, Events, Logs, Traces), latency analysis, error rate monitoring, service dependency mapping, SLIs/SLOs, alert tuning, and root cause analysis.

  • Hands-on experience administering at least one modern observability/APM platform (e.g., Splunk, New Relic, Grafana), with practical exposure to metrics, logs, distributed tracing, and platform configuration. Experience supporting full-stack observability coverage across infrastructure, application, browser monitoring layers.

  • Experience building dashboards and actionable alerts, including configuring alert workflows and integrations with incident management tools such as PagerDuty. Experience implementing or supporting OpenTelemetry-based instrumentation and improving telemetry quality across services.
    Familiarity with Kubernetes and cloud-native environments – an understanding of how applications are deployed, monitored, and scaled.

  • Experience managing telemetry pipelines and agents (e.g., collectors, forwarders, sidecars), including onboarding services and troubleshooting ingestion issues.

  • Working knowledge of scripting or automation (e.g., Shell, Python) and CI/CD concepts. Experience or familiarity with infrastructure-as-code tools such as Terraform for managing platform configurations and integrations is a plus. 

  • Comfortable collaborating with engineering teams to improve monitoring standards, instrumentation quality, and overall production visibility.

  • Relevant certifications such as New Relic APM Practitioner, Reliability Engineer – Professional, Splunk Admin, or GCP Associate Cloud Engineer are a plus.

  • Demonstrated history of living the values important to Priceline: Customer, Innovation, Team, Accountability and Trust.

  • The Right Results, the Right Way is not just a motto at Priceline; it’s a way of life. It’s therefore essential that you also meet our high standard of ethics, honesty, transparency and compliance.

There are a variety of factors that go into determining a salary range, including but not limited to external market benchmark data, geographic location, and years of experience sought/required. In addition to a competitive base salary, certain roles may be eligible for an annual bonus and/or equity grant.

The salary range for this position is $110,000K to $130,000K CAD.
 

#LI-VM1

#LI-Hybrid