ML Platform Team Leader

Match Group
Apply Now

Job Description

[AI Lab]

Hyperconnect AI Lab discovers and solves problems that are difficult to approach with existing technologies but can be solved with machine learning technology in services that connect people, thereby innovating user experience. To this end, we develop numerous models in various domains including video/voice/natural language/recommendation, and we aim to solve problems encountered while stably providing them through mobile and cloud servers, so that the technology created by AI Lab contributes to the growth of actual services. Under this goal, Hyperconnect AI Lab has been developing machine learning technologies that contribute to Hyperconnect’s products, including Azar, for several years.

[Introducing the ML Software Engineering Team]

The ML Software Engineering team under the AI ​​Lab aims to apply all of Hyperconnect's AI technologies to products to create business impact and develop sustainable systems/platforms to accelerate the application of AI technologies. To achieve this goal, the following tasks are being performed ( Interview ).

[Machine Learning-Based Backend Application Design and Implementation]

We develop various machine learning-based backend services to improve the quality of services operated by Hyperconnect and Match Group ( related tech blog ). We mainly focus on developing personalized recommendation systems and search systems. They are designed with a lot of consideration from a performance perspective to enable real-time operation on a global scale, and the microservices operated by the team handle the highest level of traffic within the company.

there is.

[Developing a real-time data pipeline for model inference]

We develop pipelines (Apache Flink, KSQL) that process real-time events and use them for model inference. We consider and design systems (ex. streaming applications, feature stores) to quickly and reliably collect, process, and serve features. Sometimes, we proactively discover features that improve model performance during the pipeline development process. For more detailed information, please refer to the following content.

- Operating an event-based live streaming recommendation system

- Data storage technology for machine learning applications

- Deview 2023 - Feature Store Implementation for Real-Time Recommendation Systems

[Development of a model serving platform]

We provide a unified serving platform using custom kubernetes operators and NVIDIA Triton. This allows you to quickly deploy ML models trained with various deep learning frameworks (Tensorflow, PyTorch) in various domains to production. We optimize the speed and throughput of model inference through software and hardware improvements, and optimize costs through continuous monitoring and high-efficiency computing resources such as AWS Neuron. We are currently operating more than 50 models in production and solving complex technical challenges that arise at this scale. For more detailed information, please refer to the following content.

  • Reduce machine learning model serving costs by 1/4
  • Case studies and tips for reducing model serving costs by applying AWS machine learning inference accelerator to Hyperconnect
  • Deview 2021 - How to deploy more models faster?

[ML Ops Infrastructure Construction and Tool Development]

We build on-premise GPU clusters (NVIDIA DGX systems) and high-speed distributed storage, and manage/operate them using Ansible to save on human and material costs required for research (Reference: Building an Ultra-High-Performance Deep Learning Cluster Part 1 ). In addition, we develop developer portals, CLI tools, etc. that can control and utilize the ML Ops components and serving platforms provided by the ML Platform team. We also conduct PoCs of rapidly developing new MLOps technologies and apply them to production when necessary.

[Building a Continuous Learning Pipeline]

We build an automated virtuous cycle structure (AI Flywheel) that utilizes data obtained from products to retrain, evaluate, deploy models, and then improve products again. We provide ML Ops components for each stage of the ML pipeline (ML data processing, ML model training, ML data deployment) to help researchers easily build ML pipelines by combining them. In addition, based on the ML Ops workflow tool, we are developing a data pipeline that collects data from various domains and a data platform that seamlessly connects cloud storage and learning environments, providing cloud infrastructure and tools to configure automatic learning workflows, and exploring new areas so that it can be utilized in both experiments and pipelines.

[Development of an inference engine that operates on mobile devices]

We research and develop an inference engine SDK that can utilize Hyperconnect's on-device model using TFLite, PytorchMobile, etc. We perform conversion, quantization, SIMD optimization, development environment construction, etc. of mobile models together with the AI ​​organization.

Responsibilities

  • Lead the ML Software Engineering Team to align with the operational direction of the AI ​​Lab and the company and lead the team to high performance.
  • We prioritize projects, allocate resources efficiently, coach and mentor team members to maximize their capabilities, and foster a team culture that encourages collaboration and a growth mindset.
  • We design and implement a high-efficiency, low-cost ML platform to innovate global user experiences through ML system development, improve the productivity of ML engineers, and promote organizational growth.
  • We work closely with ML Engineers and ML Research Scientists to put the latest ML technologies into production, and create ML systems that have never existed before based on our deep understanding of CS/ML technologies.
  • Even in situations where there is a lack of services to benchmark, we analyze technical issues that occur in large-scale systems, make decisions that are appropriate for tradeoffs based on strong CS fundamentals, and provide effective solutions.
  • Identify the latest ML system technologies and trends, introduce them to your organization, and promote innovation.
  • We support successful execution of projects through smooth communication with various stakeholders.

Required Qualifications

  • Someone with the ability to proactively motivate and educate team members to improve team performance with extreme ownership.
  • Someone with more than 5 years of experience in designing, implementing, and operating software (backend, data pipeline, DevOps, ML serving system, etc.)
  • Someone with solid basic knowledge of CS fundamentals (operating systems, networks, computer system architecture, data structures and algorithms, etc.)
  • Those who have no difficulty in adopting new languages ​​based on a deep understanding of two or more languages ​​among Java, Python, JavaScript, Go, and RUST.
  • Someone who can perform free engineering based on the ML model, which is a black box, as an unreliable component.
  • Someone with strong communication skills who can collaborate with stakeholders across multiple functions to deploy ML models into service.
  • Someone with experience leading an engineering team and with leadership and management skills to help MLSE grow.
  • Anyone with a degree or nationality that can communicate fluently in Korean is welcome.

Company Info.

Match Group

Match Group is an American internet and technology company headquartered in Dallas, Texas. It owns and operates the largest global portfolio of popular online dating services including Tinder, Match.com, Meetic, OkCupid, Hinge, PlentyOfFish and OurTime, among a total over 45 global dating companies. The company was owned by IAC until July 2020 when Match Group was spun off as a separate, public company.

  • Industry
    Social media Company
  • No. of Employees
    1,880
  • Location
    Dallas, TX, USA
  • Website
  • Jobs Posted

Get Similar Jobs In Your Inbox

Match Group is currently hiring Machine Learning Platform Engineer Jobs in Seoul, South Korea with average base salary of ₩55,000 - ₩90,000 / Month.

Similar Jobs View More