Staff Lead Software Engineer, Cortex Platform

Apply Now
  • Experience

    2-4 year
  • Salary

  • Location

    San Francisco, CA, USA
  • Job Function

    Staff Lead Software Engineer
  • Industry

    Information Technology
  • Qualification

    Degree in Computer Engineering, Degree in Computer Science, Degree in Data Science, Degree in Machine Learning, Degree in Mathematics, Degree in Statistics

Key Skills

Python Programming, Scala Programming, C++, Java Programming, Machine learning techniques, Data science techniques, SQL, Apache Hadoop, MapReduce, TensorFlow, PyTorch

Job Description

Who We Are

Cortex empowers internal teams to efficiently leverage ML by providing a platform and by unifying, educating, and advancing the state of the art in ML technologies within Twitter. We win when our customers win by helping our users stay informed, share and discuss what matters; by serving the public conversation. We?re building an AI-first company and every major initiative is increasingly dependent on the successful application of machine learning. Cortex is at the nexus of this evolution.

Our team of ML software engineers are constructing one of the strongest machine learning platforms in the world by marrying the latest ML industry practices with engineering excellence and the need to perform at Twitter scale. Our customers are all the ML engineers at Twitter and our goal is to provide a unified tooling ecosystem that allows these engineers to focus on what they are good at, building ML models with novel approaches, and abstract the way the complexities of bringing these models into a production environment.

We care deeply about:

  • Engineering excellence such as good design abstractions, API stability, unit testing, leading best practices for other engineers to follow, and solid documentation.
  • Staying abreast and compatible with a quickly shifting technology landscape for ML platform components and related open source solutions.
  • Creating the best ML Platform environment for Twitter that provides an exceptional developer experience for our engineering customers.
  • Encouraging engineering creativity and innovative solutions

Our Current projects include:

  • Establishing Kubeflow as a managed offering at Twitter
  • Enabling and sustaining GCP Infra/Platform components for broader use in Cortex platform; e.g. AI Platform, Dataflow, Data Proc, etc.
  • Improving Operations of essential ML Platform services

    • Hosted notebooks
    • Centralized ML Metastore
    • Centralized ML Dashboards

If this sounds like a team you want to be part of, great! We are looking for engineers who are passionate about writing code, have a desire to learn new technologies, love working in collaborative teams, and are committed to serving their customers.

Your responsibilities include:

  • Informing and accelerating GCP Infrastructure adoption best practices (sustaining and improving User Onboarding, IAM, Image Management, Twitter Systems Integrations, Security et al)
  • Absorbing existing SRE/Operational support scopes (GPU Cluster Management, OS/Kernel Upgrades, RPM/Python Dependency Management, Bare Metal Host Management/Puppet Manifests, etc)
  • Partnering and supporting existing Cortex Platform teams with Operational guidance and expertise on various project initiatives
  • Creating tools and automation for Operational support and management for DS/ML use cases
  • Supporting various users and developers with operational issues (e.g. ?I?m having trouble scheduling GPU jobs with Persistent Volumes?)
  • Capacity Planning
  • Maintaining the version updates of Tensorflow / PyTorch et al
  • Partner with Twitter’s Platform and Data Platform orgs to improve, enhance and influence direction and integration opportunities
  • Partner with teams to improve, enhance and integrate with the company’s GCP Adoption & Management strategy


Who You Are

  • Minimum 6+ years of handling services in a large scale distributed systems environment, preferably services on GCP e.g. BigQuery, etc.
  • Expert knowledge of Linux operating system internals, filesystems, disk/storage technologies and storage protocols and networking stack.
  • Expert knowledge of systems programming (bash and shell tools) and practical, proven knowledge of at least one higher-level language (Python, Go or Scala).
  • Comfortable working with on-prem and cloud-based infrastructure (AWS, GCP) in terms of deployment, support, monitoring, administration and troubleshooting.
  • Experience using containerization software such as: kubernetes, docker, mesos.
  • Track record of practical problem solving, excellent communication, and documentation skills
  • Proven understanding of systems and application design, including the operational trade-offs of various designs.
  • Ability to lead and mentor technical teams through design and implementation across an organization.
  • Work well with and be able to influence a myriad of personalities at all levels.
  • Be adaptable and able to focus on the simplest, most efficient & reliable solutions.
  • Solid understanding of algorithms, distributed systems design and the software development lifecyc

Company Info.


At Twitter, we serve billions of ad impressions and generate millions of dollars in revenue per day. For every ad shown on Twitter, our machine learning systems evaluate, in real-time, millions of ad candidates behind the scenes to find the best one. We are looking for talented individuals to further develop this state-of-the-art system, working as part of the machine learning engineering and data science team.

  • Industry
    Information Technology
  • No. of Employees
  • Location
    San Francisco, CA
  • Website
  • Jobs Posted

Similar Jobs View More