Site Reliability Engineer – Core Platform Engineering (m/w/d) – Gigafactory Berlin-Brandenburg

Tesla

Tesla is accelerating the world’s transition to sustainable
energy. Revolutionary strategies and products were developed within a few
years and successfully launched on a large scale. This is only possible through
extraordinary speed, innovation and efficiency.

The Role 

Tesla’s Platform Engineering is looking for a Site
Reliability Engineer to join our team. As a member of the team, you will be
building and maintaining Kubernetes clusters using infrastructure-as-code tools
like Ansible, Terraform, ArgoCD and Helm and helping the application teams to
be successful on our platform. The underlying infrastructure is a a mix of
on-premise VMs, bare metal hosts and public clouds such as AWS located all
around the globe, which presents unique challenges and opportunity to work with
different types of infrastructure technologies. A successful candidate will be
expected to possess expert knowledge in Linux fundamentals, architecture and
performance tuning; as well as software development skills to match. Experience
running Kubernetes in production will be a strong plus; we prefer Golang or
Python for any automation or tools we have to build along the way. We are the
team that runs production critical workloads for every aspect of the business
at Tesla and sets the standards for other teams, a group of well-rounded
generalists that not only solve the hardest problems in the industry but also
push other engineering teams at large to be better. Join us to get a chance to
work with some of the best engineers in the industry for one of the most
transformative companies in the history of both automotive and energy
industries.

Responsibilities: 

  • Manage our Kubernetes clusters on-prem and in the cloud to
    support our growing workloads.
  • Participating in the architecture design process and troubleshooting
    of live applications with the product teams.
  • Participating in a 24×7 on-call rotation (12 hours day shift
    once a week on a weekday and a weekend shift once every 6-8 weeks).
  • Influence architectural decisions with focus on security,
    scalability and high-performance.
  • Setup and maintain monitoring, metrics & reporting
    systems for fine-grained observability and actionable alerting.
  • Authoring technical documentation for
    workflows/processes/best practices.

Requirements:


  • 5+ years of managing web-scale infrastructure in a
    production *nix environment.
  • Ability to prioritize tasks and work independently.
  • Advanced or expert-level Linux administration and
    performance tuning skills.
  • Track record of practical problem solving under pressure.
  • Excellent communication, and documentation skills.
  • BS or MS degree in Computer Science or Engineering, or
    equivalent experience.
  • Advanced experience with configuration management systems
    such as Ansible, Terraform or Puppet.
  • Demonstrable knowledge of the Linux operating system internals,
    networking stack, filesystems, resource scheduling and process management.
  • Experience with AWS, or other cloud infrastructure
    providers.
  • Experience managing container-based workloads, using
    Kubernetes or other orchestration software in production (ArgoCD, Helm).
  • Proficiency in a high-level language like Python, Go, Ruby
    and/or Java
  • Self-driven with an analytical mind with a bias for action

What we offer


You will be working in our state-of-the-art Gigafactory
where you’ll solve the world’s most interesting problems with the best and
brightest people who share a passion to change the world. Tesla’s compensation
package includes competitive salary and Tesla shares or bonusses. Typical
benefits that are offered are a pension program, 30 vacation days, employee
insurances, relocation and commuting support.

Application for this role needs to be done in English