Infrastructure Engineer
About Stability:
Stability.ai is a community and mission driven, open-source artificial intelligence company that cares deeply about real-world implications and applications. Our most considerable advances grow from our diversity in working across multiple teams and disciplines. We are unafraid to go against established norms and explore creativity. We are motivated to generate breakthrough ideas and convert them into tangible solutions. Our vibrant communities consist of experts, leaders and partners across the globe who are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D and Biology.
About the role:
We are looking for an Infrastructure Engineer, to be responsible for developing and scaling distributed back-end systems for the Stability AI platform. The ideal candidate will have experience provisioning large compute clusters for machine learning workflows and will have a strong history of supporting teams to create best practices for reliability and scalability.
Responsibilities:
- Manage large compute clusters for ML inference and development
- Create tooling and infrastructure that abstract compute and storage in ML workflows
- Build automation and CI/CD pipelines for deploying new machine learning models
- Design and implement new features in the Stability inference platform across multiple cloud environments
- Help to drive continual performance improvements across the platform
Qualifications:
- 5+ years of experience in a DevOps or Infrastructure Engineer role building machine learning infrastructure and working with large GPU clusters
- Knowledge of cloud providers such as AWS, GCP, infrastructure-as-code frameworks and observability tools such as Grafana
- Interest and experience supporting engineering teams in creating robust processes for automation, reliability, and instrumentation
- Strong communication, collaboration, and documentation skills
- Strong programming knowledge in Python and/or Go
- Experience with distributed systems for high performance computing
- Well-versed in data structures, data modeling, and database management systems as well as object and file storage systems.
- Experience with Git, Containers, networking and deployment and automation
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.