Job Information
Saildrone Staff Site Reliability Engineer in Alameda, California
About Us Saildrone is an oceanographic survey and maritime defense company creating a paradigm shift in how navies, civil governments, and commercial organizations obtain the real-time, accurate data required to monitor and protect our oceans. Saildrone's fleet of uncrewed surface vehicles (USVs) carry purpose-built payloads supporting border protection, critical infrastructure security, hydrographic survey, offshore energy, and metocean monitoring. Powered by renewable wind and solar energy, Saildrone USVs provide long-duration operations measured in months, not days. Proprietary software applications and machine learning technology transform collected data into actionable insights and intelligence. We are based in Alameda, CA, with offices in Washington DC and St. Petersburg, FL, and operate missions worldwide. We are backed by top-tier investors in the frontier tech and sustainability sectors, including Social Capital, Capricorn, Lux Capital, BOND Capital, and Emerson Collective. This is an exciting opportunity with a fast-growing team at the cutting-edge intersection of big data services and autonomous hardware. You will be part of a high-performing, multidisciplinary team that delivers high impact for humanity and future generations. The Role We are seeking a talented Staff Site Reliability Engineer with a strong focus on observability and mentorship to join our dynamic team. In this role, you will act as a team tech lead, guiding engineering efforts to ensure the reliability, scalability, and performance of our systems while fostering a culture of continuous learning and improvement across the Software group. Your expertise in observability tools and practices will play a crucial role in scaling up Saildrone's Site Reliability Engineering team, helping to ensure the quality of service that our customers have come to expect. Responsibilities * Monitoring Architecture: Design and implement robust monitoring frameworks to track the health and performance of applications and infrastructure. * Observability Practices: Establish observability best practices, leveraging tools such as Datadog, Prometheus, Grafana, or similar to provide actionable insights. * Alerting Strategies: Develop and maintain effective alerting strategies to ensure prompt incident response while minimizing noise. * Incident Management: Lead incident response efforts, conducting thorough postmortems and root cause analyses to prevent future occurrences. * Performance Optimization: Analyze system performance metrics and logs to identify bottlenecks and implement solutions for optimization. * Collaboration: Work closely with development, operations, and product teams to integrate observability into the development lifecycle and improve system reliability. * Documentation: Create and maintain comprehensive documentation of monitoring setups, incident responses, and SRE best practices. * Capacity Planning: Collaborate on capacity planning efforts to ensure the infrastructure can scale to meet growing demands. * Tooling and Automation: Identify opportunities for automation in monitoring and alerting processes to improve efficiency and reliability. * Mentorship: Provide guidance and mentorship to our new SRE team and to the Software group as a whole, sharing expertise in monitoring, observability, and incident management. Minimum Experience * 8+ years SRE experience. BA/BS in related field or equivalent experience. Required Skills * Strong knowledge of AWS services and managing cloud-based infrastructure at scale. * Strong experience with monitoring and observability tools (e.g., Datadog, Grafana, Prometheus). * Strong proficiency with log management and analysis tools (e.g., Datadog Logs, ELK Stack, Splunk). * Skills in scripting languages (e.g., Python, Bash) for automation and custom monitoring solutions. * Strong experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation. * Strong proficiency with Kubernetes, Helm Charts, and Helm deployment patterns. * Understanding of key performance metrics and monitoring aspects (e.g., CPU usage, memory consumption, latency, error rates). * Expertise in setting up alerts, handling incidents, and performing root cause analysis. * High attention to detail for accurate monitoring, alert configuration, and performance tuning. * Experience with monitoring databases (e.g., MySQL, PostgreSQL, MongoDB) and understanding related performance metrics. * Effective communication skills to collaborate with cross-functional teams and report on system health and incidents. * Excellent problem-solving skills and a proactive mindset. Desired Skills and Experience * AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer). * Experience with other cloud platforms (Azure, Google Cloud Platform). * Knowledge of networking fundamentals, including DNS, load balancing, and content delivery networks (CDNs). * Ability to anticipate potential issues and implement proactive monitoring strategies. Physical Requirements * Work is performed on a computer and requires ability to operate a keyboard and other peripheral devices. Location: This is a hybrid position in Alameda, CA. Our waterfront office offers beautiful views of San Francisco Bay in always sunny Alameda. Even our walls have good karma, our offices mix software development with a hardware production line in the former airplane hangar used to film 'The Matrix'. Benefits: * Paid time off, including vacation, bereavement, jury duty, sick time and parental leave * Comprehensive and competitive medical, dental and vision plans, and HSA with employer matching. * Company sponsored life insurance * Stock Options * Annual stipend for continued learning and development * Quarterly company BBQs at our Alameda HQ (bring your friends and family!) * Free Bay Area Public Transportation via AlamedaTMA with the BayPass Clipper Card * Plenty of snacks in our 3 office locations * Dog-friendly work environment A reasonable estimate of the current range is $149,400 to 198,000 annually. Catch up on the latest news about us: TIME 100 Most Influential Companies 2024: Saildrone The Tiny Craft Mapping Superstorms at Sea - The New York Times An Underwater Mountain was Newly Discovered off California Coast - San Francisco Chronicle The Navy Is Using Robot Ships to Deter Human Smuggling out of Haiti - Defense One How US Navy Experiments Could Get Drones Beyond Spying and Into Battle - Defense News USVs Could Deter IUU Fishing - USNI Proceedings Mullen, Former Joint Chiefs Chairman, to Lead Board for Unmanned Tech Firm Saildrone - Breaking Defense Saildrone's First Aluminum Surveyor Autonomous Vessel Splashes Down for Navy Testing - TechCrunch Saildrone Featured Videos Playlist We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. At Saildrone, we value diversity and are committed to creating an inclusive workplace that welcomes people from all backgrounds, experiences, and perspectives. We believe that a diverse and inclusive team leads to innovation and better problem-solving. We encourage applications from candidates of all genders, ethnicities, races, sexual orientations, disabilities, and backgrounds. Individual compensation packages are based on geographic location, scope of the role, relevant experience, and the ability to deal with complexity and problem solve within our organization, among other factors. All employees are required to provide proof of authorization to work in the U.S. within their first 3 days of work. Please note that the Company does not sponsor employees for work visas or permanent resident cards to work in the U.S. If you need sponsorship for a work visa or green card, you will not be qualified for employment with Saildrone. Any unsolicited resumes/candidate profiles submitted through our website or to personal email accounts of employees of Saildrone are considered property of Saildrone and are not subject to payment of agency fees. #LI-Hybrid #LI-LP1