Tuesday, 18 February 2014

Principal Site Reliability Engineer Cloud

Posted by Unknown on 22:41 with No comments
Job Location: Home Based, Mobile


As a Site Reliability Engineer, you will be building, evolving, and operating the infrastructure automation platform used to power our Cloud services. You will be to ensure that our production environment is operating and performing optimally and efficiently; that software is released and deployed in an efficient and streamlined manner, from development all the way to production.



This is a hands-on operational role with a balanced amount of tool and infrastructure development, including advanced scripting and automation. You will be supporting our internal infrastructure, as well as providing managed services support, product development, and support the entire stack for a cloud-based service offering.



Success in this role requires very strong system administration skills, an aptitude for distributed systems and attention to minute details.


You need to have well exemplary network, systems and code-level troubleshooting abilities and will be expected to analyze complex system behaviors or performance problems as well as trace issues across multiple systems. This position works as a first responder and is ultimately responsible for ensuring our cloud infrastructure services are up and running.



Responsibilities:


  • Operate and deploy cloud services and related projects from development to production


  • Develop automation, processes, and tools designed to make this process simpler and more robust. Bridge Engineering and core shared operations services



  • Participate in troubleshooting, capacity planning and analysis, performance analysis activities



Requirements:



  • BA/BS in Computer Science preferred, or equivalent experience


  • 8+ years experience in a highly-complex technical operations environment



  • Demonstrated success maintaining an environment where key production components are built from source code and deployed via automation tools

  • Up to 50% travel


  • Hands on operational experience in a high-volume or critical production service environment distributed systems, capacity planning, continuous deployment



MUST HAVE:


  • 3+ years of Linux experience including internals/troubleshooting ability



  • Network – understanding & troubleshooting from the OS perspective, load balancing/firewall concepts – service oriented


  • Scripting language – Perl, Python, Bash

NICE TO HAVE


  • Configuration Management Tools – Puppet, Chef, CFEngine

  • Programing Language – Java, C/C++


  • Experience at Scale


  • ATMOS experience

  • vSphere or OpenStack experience


  • UCS /storage (VNX, VMAX, etc)/Cisco networking config/management experience


  • Expertise in IP networking, including familiarity with the functionality, operating, and failure modes of the network (iptables, haproxy, vpn, tcp/ip, http)


  • Proven technical troubleshooting and performance tuning experience, especially in a virtual (VMWare) environment


  • Ability to handle periodic on-call duty as well as spider-sense awareness of services’ health

EMC is an Equal Employment Opportunity employer that values the strength diversity brings to the workplace.

EMC does not accept unsolicited Agency Resumes. EMC will not pay fees to any third party agency or firm that does not have a signed "EMC Agency Fee Agreement".

PLEASE NOTE: This position can work remotely from mutliple locations.

Job ID: 111542BR
apply

0 comments:

Post a Comment