The Software Assurance Group at Oracle is seeking a talented Senior Software Developer to join our machine learning engineering team, which works at the forefront of enabling secure, scalable, and highly available AI-powered solutions. In this role, you will engage closely with fellow engineers and stakeholders, including ML engineers, architects, and product managers, to design, develop, and deliver innovative tools and cloud-native applications that ensure operational excellence for large-scale, global AI systems.
Our team develops products that support the secure transport and processing of ML artifacts, scalable telemetry pipelines for real-time monitoring, and orchestration frameworks to efficiently manage cloud infrastructure resources for dynamic, multi-tenant AI workloads. You will work hands-on across the software engineering lifecycle, driving solutions from design through implementation, testing, deployment, and ongoing operation.
Key Responsibilities:
- Design and deliver robust, scalable, and secure cloud-native features with end-to-end ownership, including development, testing, operational excellence, and continuous improvement.
- Resolve complex technical issues and influence architectural decisions for distributed, multi-platform solutions.
- Collaborate cross-functionally with technical leads, engineering management, product managers, and architects to ensure timely, high-quality delivery of features.
- Proactively identify and mitigate project risks and blockers.
- Support integration efforts for external application teams and guide them on best practices.
- Stay up-to-date with latest Oracle Cloud technologies and continuously evolve our provisioning and enablement processes.
- Mentor and support junior team members, fostering technical growth and a culture of excellence.
Required Qualifications:
- BS in Computer Science or related technical fields
- 4+ years of software engineering experience, including direct exposure to at least one major cloud service provider (OCI, AWS, Azure, or GCP).
- Proficiency in Python and at least another modern programming language (Go, Java, Kotlin, or C/C++)
- Deep understanding of distributed systems architecture, with a focus on fault tolerance and high availability.
- Hands-on experience designing and building microservices and cloud-native applications.
- Experience in containers and orchestration frameworks (Docker, Kubernetes).
- Excellent problem-solving skills, strong communication capabilities, and detail-oriented approach.
- Working knowledge of observability and monitoring tools (Prometheus, Grafana), CI/CD pipelines (Jenkins, GitLab), and build tools (Gradle, Maven, or similar).
- Understanding of core machine learning concepts and workflows to support ML engineering initiatives.
- Demonstrated ability to work both independently and collaboratively in a fast-paced environment with minimal supervision.
Preferred Qualifications:
- MS in Computer Science or related technical fields.
- Familiarity with architectural patterns for high availability, scale-out, disaster recovery, and security in cloud environments.
- Experience designing or maintaining telemetry and metrics systems, and visualization dashboards using modern tools.
- Prior experience with high-throughput distributed systems or data pipelines.
Qualifications
Career Level - IC3