Job Location: Home Based, Mobile
As
a Site Reliability Engineer, you will be building, evolving, and
operating the infrastructure automation platform used to power our Cloud
services. You will be to ensure that our production environment is
operating and performing optimally and efficiently; that software is
released and deployed in an efficient and streamlined manner, from
development all the way to production.
This
is a hands-on operational role with a balanced amount of tool and
infrastructure development, including advanced scripting and automation.
You will be supporting our internal infrastructure, as well as
providing managed services support, product development, and support the
entire stack for a cloud-based service offering.
Success
in this role requires very strong system administration skills, an
aptitude for distributed systems and attention to minute details.
You
need to have well exemplary network, systems and code-level
troubleshooting abilities and will be expected to analyze complex system
behaviors or performance problems as well as trace
issues across multiple systems. This position works as a first
responder and is ultimately responsible for ensuring our cloud
infrastructure services are up and running.
Responsibilities:
- Operate and deploy cloud services and related projects from development to production
- Develop automation, processes, and tools designed to make this process simpler and more robust. Bridge Engineering and core shared operations services
- Participate in troubleshooting, capacity planning and analysis, performance analysis activities
Requirements:
- BA/BS in Computer Science preferred, or equivalent experience
- 8+ years experience in a highly-complex technical operations environment
- Demonstrated success maintaining an environment where key production components are built from source code and deployed via automation tools
- Up to 50% travel
- Hands on operational experience in a high-volume or critical production service environment distributed systems, capacity planning, continuous deployment
MUST HAVE:
- 3+ years of Linux experience including internals/troubleshooting ability
- Network – understanding & troubleshooting from the OS perspective, load balancing/firewall concepts – service oriented
- Scripting language – Perl, Python, Bash
NICE TO HAVE
- Configuration Management Tools – Puppet, Chef, CFEngine
- Programing Language – Java, C/C++
- Experience at Scale
- ATMOS experience
- vSphere or OpenStack experience
- UCS /storage (VNX, VMAX, etc)/Cisco networking config/management experience
- Expertise in IP networking, including familiarity with the functionality, operating, and failure modes of the network (iptables, haproxy, vpn, tcp/ip, http)
- Proven technical troubleshooting and performance tuning experience, especially in a virtual (VMWare) environment
- Ability to handle periodic on-call duty as well as spider-sense awareness of services’ health
EMC is an Equal Employment Opportunity employer that values the strength diversity brings to the workplace.
EMC
does not accept unsolicited Agency Resumes. EMC will not pay fees to
any third party agency or firm that does not have a signed "EMC Agency
Fee Agreement".
PLEASE NOTE: This position can work remotely from mutliple locations.
Job ID: 111542BR
0 comments:
Post a Comment