AWS, Github, Google Cloud Platform (GCP), Machine learning techniques
Roboflow is scaling rapidly. We now manage over 100 million images for hundreds of thousands of users. Having secure and reliable cloud infrastructure to support our growth is of paramount importance.
The Roboflow product spans the entire end-to-end machine vision pipeline. So, naturally, the infrastructure presents a wide range of challenges. From driving efficiencies in GPU batch computing to shaving off milliseconds off latencies of our hosted machine learning inference APIs, to supporting hundreds of thousands of users worldwide with best-in-class site reliability and data protection.
Our infrastructure runs across AWS and GCP. Our core web-app runs on Firebase (Firestore, Functions, Storage, Hosting). We heavily utilize serverless compute products where possible, but also run clusters of GPU-powered machines on AWS Batch and in managed instance groups fed by pub-sub queues when necessary. We are increasingly using Kubernetes internally, and are working on a self-hosted version of our platform.
The Role
The focus of this role is on improving, scaling, and maintaining our the infrastructure that powers our core app, including: our cloud architecture, databases, file storage, search cluster, micro-services, and machine learning pipelines.
You'll be working alongside our existing infrastructure team along with doing cross-team work spanning product, operations and customer-facing projects and should have the ability to context switch across a wide range of infrastructure, security and systems engineering work in a fast-paced startup environment.
Specific Skillset
The following would be helpful:
Roboflow is a computer vision platform that simplifies the process of building and deploying computer vision models. It provides a range of tools and features that make it easy to create, manage, and optimize computer vision workflows. Roboflow also supports a range of popular deep learning frameworks, including TensorFlow, PyTorch, and Keras, so you can easily train and deploy your computer vision models on a variety of platforms.