Senior DevOps Engineer
We are looking for an experienced System Operation Manager responsible for leading production operation system and managing our hybrid based web services and web applications. You and your team will be responsible for the stability, the performance improvements, the automation of large scale platform, leveraging open source technologies. If you are tired to repeat yourself and lover for devops principle and system automation, this is a great opportunity to join us to build up an awesome analytics platform.
Your responsibilities will include:
Directly managing operation team to ensure 24 X 7 online services and timely delivery of customer end products.
Providing platform support including troubleshooting, root cause analysis, and problem resolution.
Understanding of product strategic goals & priorities.
Managing production performance such as forecast, proactive risk assessment, backlog, SLA and other key performance metrics.
Managing Staging/CI environment with the art to ensure continuous delivery.
Ensuring cost-effective service delivery by automating critical processes, including server deployment, configuration, monitoring, and problem resolution.
Ensuring systems are secure and compliant with industry best practices.
Dominating capacity planning and scalability to ensure systems are optimized for continuous growth.
Designing and implementing system automation architecture, infrastructure, and process using tools such as Salt, Puppet or Chef.
Working very closely with Engineering team ensuring new products and features meet Operational requirements.
Establishing a metrics-
You should have a good experience in the system operation related field and be ready to take the responsibility of managing this important part of App Annie.
Experience to lead 4+ members operation team in web operation/large scale backend system operation.
Fluent level of English written and spoken.
3+ years experience in network and system engineering position for web operation or large scale backend system.
Strong skills with Linux system (ubuntu, debian) administration including nginx/apache/haproxy.
Strong skills in scripting using Shell/Python/Ruby.
Hands on experience in configuration management tool like (salt,chef,puppet).
Hands on experience on nagios,munin and other open source monitoring solutions.
Hands on experience testing and deploying large scale server software.
Strong problem solving, analytical and troubleshooting skills.
Hands on experience in AWS cloud service (EC2, EMR, SQS, RDS, S3).
Good understanding on TCP/IP and network administration including proxy and VPN is a big plus.
Good DB operation experience on PSQL scripting, Postgres replication or Load Balancing is big plus.
Good Experience in large scale Hadoop cluster (HDFS, Mapreduce) operation/tuning/monitoring is a big plus.
Strong RDBMS operation experiences (PostgreSQL, Mysql). Scale-out solution like database partition/read scale is plus.
Experience in NoSQL DB operation like (Mongodb, HBase) is a big plus.
Experience in Windows administration is plus.