Site Reliability Engineer
Who you are:
Site Reliability Engineer or simply an engineer with experience in delivering high availability of an infrastructure, as well as ensuring that our customers and end-users have a great experience with our internal applications and product-centric services. As an SRE, you are a seasoned professional with a deep understanding of both software engineering and systems administration. You thrive in dynamic environments where you can apply your problem-solving skills to ensure the reliability, scalability, and performance of our systems. You are passionate about automating processes and implementing best practices to maintain high availability and efficiency. Collaboration comes naturally to you, and you excel in cross-functional teams, communicating effectively to drive projects forward.
Location: Candidates to be based in Czech Republic
What you’ll do:
- System Monitoring and Maintenance: Implement and maintain monitoring solutions to ensure the reliability and performance of our systems. Proactively identify and resolve issues to minimize downtime and optimize system performance.
- Automation and Tooling: Develop automation scripts and tools to streamline processes, automate repetitive tasks, and improve operational efficiency. Continuously evaluate and implement new technologies to enhance our infrastructure.
- Incident Response and Troubleshooting: Respond to and resolve incidents in a timely manner, identifying root causes and implementing preventive measures to minimize recurrence. Collaborate with cross-functional teams to address system-wide issues and implement long-term solutions.
- Capacity Planning and Scalability: Conduct capacity planning assessments to anticipate future resource needs and ensure scalability. Work closely with development teams to optimize application performance and resource utilization.
- Infrastructure Architecture: Design, implement, and maintain scalable and resilient infrastructure solutions. Evaluate and recommend infrastructure technologies and configurations to support the evolving needs of the organization.
- Continuous Improvement: Drive initiatives to improve reliability, scalability, and performance through process enhancements, automation, and best practices adoption. Participate in post-incident reviews and contribute to the development of preventive measures and action plans.
- Documentation and Knowledge Sharing: Create and maintain comprehensive documentation of system configurations, procedures, and troubleshooting guides. Share knowledge and best practices with team members to promote collaboration and skill development.
What you need:
- Technical Skills: Proficiency in scripting languages (e.g., Python, Shell) and configuration management and orchestration tools (e.g., Ansible, Terraform). Strong understanding of Windows and UNIX/Linux systems administration and networking concepts. Experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
- Monitoring and Alerting: Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, New Relic, Datadog) and logging frameworks (e.g., ELK stack). Ability to configure and customize monitoring solutions to meet specific requirements.
- Troubleshooting Expertise: Proven track record of diagnosing and resolving complex system issues in production environments. Familiarity with incident management processes and tools (e.g., PagerDuty, Opsgenie, JIRA).
- Automation Mindset: Demonstrated ability to automate manual tasks using scripting languages and infrastructure-as-code techniques. Experience with version control systems (e.g., Git) and CI/CD pipelines.
- Collaborative Spirit: Strong communication and interpersonal skills, with the ability to work effectively in cross-functional teams. Willingness to mentor junior team members and share knowledge with peers.
- Problem-Solving Aptitude: Analytical mindset with a proactive approach to problem-solving. Ability to prioritize tasks and manage multiple projects in a fast-paced environment.
- Continuous Learning: Commitment to staying updated on industry trends and emerging technologies. Willingness to pursue relevant certifications and participate in professional development opportunities.
- Bachelor’s Degree: Bachelor’s degree in Computer Science, Information Technology, or a related field. Equivalent work experience may be considered in lieu of a degree.
What is a plus:
- Experience with AWS – EC2, EKS, ECS, ELB, RDS, Elasticache, Cloudfront, WAF, Route53, AutoScaling, Lambda, IAM, and S3.
- Experience with Cloudflare
- Solid knowledge of building systems and processes.
- Solid knowledge of Distributed Systems and Microservices.
- Bias for process automation and orchestration.
- Strong understanding of data structures, algorithms and relational & NoSQL databases.
- Understanding of how commodity servers, operating systems, and network devices function, perform, and scale.
- Understanding of Continuous Integration / Continuous Delivery (CI/CD) and Agile software engineering practices.
- Knowledge of professional software engineering best practices for full software development life cycle, including coding standards, code reviews, source control management, build processes, testing and operations.
- Recognition and adoption of best practices in documentation, testing, security, operational support at scale, and efficient use of resources.
- Clear written and verbal communication skills.
Why Invicti?
Your Health & Wellness Matters:
Employee Assistance Program: Emotional Support Counseling services - 24/7. Life Coaching, Dependent Care, Elder Care,Financial & Legal Support, Wellness Coaching, New Parent Support and more
Family Leave: Paid leave as per the law in Czech for birthing parent recovery. 4 week paid leave for non-birthing/bonding parent
We Value Work/Life Balance:
Pension Insurance: We offer a comprehensive pension insurance plan to safeguard your financial future. We cover a contribution equivalent to 3% of your monthly gross salary up to a maximum of CZK50,000 per calendar year.
Remote Working: Work from home or join us in our Brno Office, whichever works best for you! We offer Remote working Stipend
Quarterly Thrive-Wellness Days :One extra vacation day per quarter where the entire company takes a break from normal, daily activities to refresh and rejuvenate
Volunteerism Time Off: 5 days of paid time off each year to participate in the volunteer activities of your choice
Paid Birthday Off: Take your birthday off to celebrate you!
Mobile Allowance benefit: This allowance will be provided to ensure you have support for work-related communication and tasks.
We Value You:
Employee Recognition: Ongoing recognition & rewards. A Culture that emphasizes personal and professional growth
At Invicti, we embrace diversity and individuality in all forms. Discrimination has no place here - regardless of race, religion, gender, age, ability, sexual orientation, or any other aspect that makes you unique. We're all about creating a space where everyone feels valued and included. So come as you are and join us in shaping the future of our industry.