Sr Manager, Datacenter Engineering

Tesla

PLACEHOLDER-Will update with new JD when i get one 

The Role:

As a Datacenter Engineer you will be responsible
for participating the day-to-day operations of the Tesla datacenter engineering
team. The team performs all the on-premise datacenter work that supports all
production and engineering work that makes Tesla a world leader in self-driving
EV, energy storage, and solar power technology. Continuous deployment,
monitoring, maintenance, improvement, and rapid turn-around on service requests
from all over the organization is imperative to drive a successful production
environment in the datacenter.

You’ll be a core member in a closely integrated, cross-functional,
and versatile team that performs most racking, stacking, wiring, and
implementation designs, implements, and maintains all Tesla datacenter
resources. With the ever-growing need for more and more data, compute, storage,
and networking locally, and in remote locations – datacenter operations need to
follow suit, be scalable through more automated processes for deployment,
monitoring, and alerting. You will be responsible for ensuring greatly improved
processes in precision deployments of production systems by leveraging the
combined resources the team provides.

Responsibilities:

  • Daily rack, stack, and maintenance of computer,
    storage, and network equipment
  • Plan, spec, and pull wires to connect infrastructure
    equipment as needed while maintaining datacenter standards
  • Help maintain inventory of components in the
    datacenter, and to keep the datacenter clean / organized
  • Leverage and improve upon existing data center
    deployments to ensure continuous operation
  • Work with engineering teams to understand useful
    metrics to collect and implement such monitoring and alerting with
    existing monitoring solutions at the datacenter level.
  • Organize and document implemented solutions for long
    term information retention with our internal ticketing and documentation
    system.
  • Work closely with involved parties automated workflows
    that can be easily implemented by remote hands with little or no
    understanding of internal systems.
  • As part of the team, respond to, and document submitted
    support tickets relating to the functionality of various systems present
    in the datacenter.
  • Help develop automated tools to collect information
    that can be directly used to assist users creating root cause analysis for
    issues reported.

Requirements:

  • BS in Computer Science, or 3 years of relevant work
    experience
  • 3+ years experience with:
    • Computer deployment and operations (CPU / GPU) – Rack
      and Stack
  • 3+ years experience with:

·      

    • Linux operating system flavors (CentOS/RHEL, Ubuntu)

·      

    • Storage systems (On-prem and/or in-cloud)
  • Excellent time management and communication skills are
    absolute musts
  • Ability to step up and take ownership to bring complex
    tasks to completion
  • Ability to travel around sites in the bay area

Nice to have:

  • Experience with multi-site on-prem and in cloud hybrid
    software and hardware deployments
  • Previous experience at the large-scale
    data center and remote systems management