Principal Site Reliability Engineer Cloud ~ US Jobs

Job Location: Home Based, Mobile

As a Site Reliability Engineer, you will be building, evolving, and operating the infrastructure automation platform used to power our Cloud services. You will be to ensure that our production environment is operating and performing optimally and efficiently; that software is released and deployed in an efficient and streamlined manner, from development all the way to production.

This is a hands-on operational role with a balanced amount of tool and infrastructure development, including advanced scripting and automation. You will be supporting our internal infrastructure, as well as providing managed services support, product development, and support the entire stack for a cloud-based service offering.

Success in this role requires very strong system administration skills, an aptitude for distributed systems and attention to minute details.

You need to have well exemplary network, systems and code-level troubleshooting abilities and will be expected to analyze complex system behaviors or performance problems as well as trace issues across multiple systems. This position works as a first responder and is ultimately responsible for ensuring our cloud infrastructure services are up and running.

Responsibilities:

Operate and deploy cloud services and related projects from development to production

Develop automation, processes, and tools designed to make this process simpler and more robust. Bridge Engineering and core shared operations services

Participate in troubleshooting, capacity planning and analysis, performance analysis activities

Requirements:

BA/BS in Computer Science preferred, or equivalent experience
8+ years experience in a highly-complex technical operations environment
Demonstrated success maintaining an environment where key production components are built from source code and deployed via automation tools
Up to 50% travel

Hands on operational experience in a high-volume or critical production service environment distributed systems, capacity planning, continuous deployment

MUST HAVE:

3+ years of Linux experience including internals/troubleshooting ability

Network – understanding & troubleshooting from the OS perspective, load balancing/firewall concepts – service oriented
Scripting language – Perl, Python, Bash

NICE TO HAVE

Configuration Management Tools – Puppet, Chef, CFEngine
Programing Language – Java, C/C++
Experience at Scale
ATMOS experience
vSphere or OpenStack experience
UCS /storage (VNX, VMAX, etc)/Cisco networking config/management experience

Expertise in IP networking, including familiarity with the functionality, operating, and failure modes of the network (iptables, haproxy, vpn, tcp/ip, http)
Proven technical troubleshooting and performance tuning experience, especially in a virtual (VMWare) environment
Ability to handle periodic on-call duty as well as spider-sense awareness of services’ health

EMC is an Equal Employment Opportunity employer that values the strength diversity brings to the workplace.

EMC does not accept unsolicited Agency Resumes. EMC will not pay fees to any third party agency or firm that does not have a signed "EMC Agency Fee Agreement".

PLEASE NOTE: This position can work remotely from mutliple locations.

Job ID: 111542BR

apply

US Jobs

Tuesday, 18 February 2014

Principal Site Reliability Engineer Cloud

0 comments:

Post a Comment

Popular Posts

Recent Posts

Sample Text

Categories

Blog Archive

About Me

Text Widget

Unordered List