Site Reliability Engineer - Unmetric Chennai - Falcon.io

Chennai, India Engineering Temps Plein
Cision Ltd is a leading global provider of innovative earned media software and services to public relations and marketing communications professionals. Cision’s Communications Cloud platform allows users to identify key influencers, craft and distribute strategic content, and measure meaningful impact of PR and Communications Teams’ campaigns. Cision has over 4,500 employees with offices in 23 countries throughout the Americas, EMEA, and APAC. For more information about our products and services, visit www.cision.com and follow us on Twitter @Cision.
 
Cision employs the brightest, most passionate people in the tech industry. We’d love for you to join that group! We invest in our people through training and professional development and we empower them to grow their careers - because the most important measure of our success is yours.

We are seeking a Site Reliability Engineer with 3+ years of experience in similar roles to join our development team in Chennai and maintain our fast-growing web infrastructure. This candidate would work primarily with Amazon Web Services and will be in charge of developing a scalable, maintainable and secure infrastructure to allow for our growing base of web-based applications.

What you'll do:

  • Automate the creation and management of servers for the web environment as well as internal applications.
  • Establish regular monitoring and security reviews of our infrastructure of approximately 100+ virtual servers across AWS, Linode.
  • Design and implement tools to improve the performance and scalability of our web and application servers.
  • Create and manage processes for adhering to our compliance requirements.
  • Create documentation to ensure shared knowledge about the deployment, maintenance, and security of our web infrastructure.
  • Interact with cross-functional teams on projects regarding the deployment of new servers according to use cases.
  • Manage Linux servers that run services such as Apache, MySQL, with the ability to report on application layer issues to the development team.
  • Research and recommend innovative, and where possible automated approaches for system administration tasks. Identify approaches that leverage our resources and provide economies of scale.
  • Perform daily system monitoring, verifying the integrity and availability of all server resources, systems, and key processes, reviewing system and application logs for any production issues. 
  • Provide regular after-hours support for critical systems.
  • Perform regular security auditing to identify any possible intrusions and identify risks.
  • Perform daily backup operations, ensuring all required file systems and system data are backed up as required.
  • Repair and recover from hardware or software failures; coordinate and communicate with impacted users and departments.
  • Perform system administration of web platform components (Apache, SOLR, MySQL, Wordpress) in both live and development environments.
  • Contribute to and maintain organizational system standards.
  • Source Code Management and Release Management skills are a plus.

What you have:

  • Strong background in Linux/Unix Administrations.
  • Excellent scripting abilities - Bash Scripting, Python, Perl.
  • Experienced in core services of cloud service providers especially AWS. 
  • Basic knowledge of log management - Elastic-Stack will be a plus. 
  • Good understanding of SCM, CI/CD & automation procedures - SVN, Jenkins, Gitlab, Github.
  • Basic knowledge of Docker and any orchestration tools like K8s, Cattle by Rancher.
  • Excellent understanding of web and application server configurations for optimizing performance - Apache’s Tomcat and httpd services. 
  • Exposure to infrastructure Monitoring & alerting systems - Zabbix, Nagios, Twilio, PagerDuty, New Relic.
  • Good understanding of MySQL configuration and administration.
https://jobs.lever.co/cision/b170f327-77f3-4b25-8b48-0811d5431ebd