Backend Software Engineer, Autopilot AI (Dojo)

Tesla

The Role: 

Tesla’s Autopilot Machine Learning team is currently seeking a software engineer to design and build a database and associated RESTful API for command and control of a custom ML training accelerator (DOJO). In particular, this engineer will work with hardware designers, embedded software engineers, and ML engineers to design, prototype, and implement a web service that is responsible for interfacing to existing and future custom distributed systems for ML training. The candidate must be comfortable with from-scratch API design and associated back-end software stack specification. Expertise in design of web APIs and associated back-end database for eventual integration to a custom web application are essential for this position.   

Responsibilities:  

  • Set technical direction, architect, implement and maintain backend API stack for Machine Learning coordination of a highly distributed system.  
  • Work on the platform of tools and infrastructure that the Machine Learning team needs to be effective. This spans the scope from machine firmware interface to frontend user interface.   
  • Automation and coordinate required hardware resources with the team managing the cluster hardware to maintain high availability.  
  • Interface with both Machine Learning and hardware design teams to understand current and future requirements and priorities.   Requirements:   
  • Strong experience building APIs in Python, GoLang, Node.js, or Java.   
  • Experience working in multithreaded or highly distributed Linux environments.   
  • BS/MS in Computer Science or the equivalent in experience with evidence of exceptional ability.  
  • Minimum 5 years’ experience building web services.   
  • Interest in Machine Learning, Computer Vision or Neural Networks.  

Extra but not required:   

  • Experience building modern web applications using React or similar component-based libraries.  
  • Experience working with large computer clusters.  
  • Experience working with distributed networks and IoT coordination.   
  • Experience working with workload management software such as Slurm or LSF.