- Location – Noida, Gurgaon, Pune, Bangalore, Chennai, Hyderabad
- Work Mode – Currently, it is WFH, candidate may be requested to come to the base location office occasionally if there’s any client, leadership visit or any specific project/account requirement.
Operations – Mainframe
- Manage production operating system resource availability
- Perform production operating system IPL / reboot
- Monitor production operating system, and devices
- Perform root cause analysis, resolution and / or escalation for production systems (this does not include business applications)
- Execute recovery procedures for production operating systems and devices
- Execute and reply to console commands
- Ensure procedures are implemented and followed
- Take the appropriate, predefined recovery actions for the various operational events
- Restart failing components after an outage
- Record and route problems to appropriate support groups
- Provide operational status as required
- Execute recovery procedures for production operating systems and devices
- Perform automated startup and shutdown of the production operating system
- Execute production subsystem (e.g. IMS, CICS, DB2, IDMS) started tasks restarts
- Monitor subsystems (e.g. IMS, CICS, DB2, IDMS)
- Managing (i.e. owning) the incident through service restoration
- Validating severity classification of the problem
- Determining the scope of the problem
- Assessing whether Problem Solver has determined what the problem is and whether a recovery plan has been mapped out
- Assembling a SWAT team of technical support people (other levels of support, across platforms as required), if the Problem Solver is unable to determine what the problem is
- Facilitating the SWAT/Service Recovery Team meeting
- Escalating as required
- Driving problem determination activities
- Driving restoration plans
- Ensuring the notification of the Location Crisis Manager for Data Center Crisis for exceptional outages (every single customer outage)
- Ensuring that Service Management (account team) has been contacted to confirm that the service has been restored to the customer’s satisfaction (or problems reported by a customer)
- Facilitate and/or make service restoration decisions/recommendations (engage the Account Team as required).
- Ensuring that the progression of the problem restoration and all relevant times are documented
- Contributing to the outage review or RCA process as required
- Ensuring that internal notification and escalation activities are executed.
- Perform system IPL’s
2. Reply to WTOR’s
3. Perform HMC functions
4. Take action on system alerts/messages
5. Perform checklists
6. Monitor SLA regions
7. Startup/Shutdown of online regions
8. Perform SAD
9. Understands D/R concepts and has the ability to execute them
10. Follow up any high severity issues until it gets resolved
11. Familiar with SDSF, ISPF, JCL, TSO, VTAM, CICS, DB2, $AVRS, SAR, SYSVIEW
12. Ability to manage unscheduled outage
Batch Monitoring and Restart- Mainframe
- Should have CA7 , Control M, TWS Tool knowledge.
- Monitor scheduled batch jobs
- Resolve batch scheduling conflicts (non-application) (root cause analysis and change management)
- Monitor scheduler related incidents, and develop and recommend changes to the scheduler database
- Performing required batch setup activities (adhoc requests)
- Schedule on-request batch jobs that require immediate execution
- Invoke resolution and restart procedures in case of failures in the batch jobs
1. Restart job from step/top with/without step Tracking
2. Understand and execute as per Job documentation
3. Understand complex batch streams and impacts to the downstream
4. Perform activities like Hold/Cancel/Demand/Force complete jobs
5. Monitor SLA jobs as per checklist
6. Perform notification of critical jobs as defined in the procedure
7. Performing actions on the Jobs like Submit, Add/Delete dependencies, hold, skip a schedule etc as requested by resolver
8. Knowledge on JCL and do necessary abend fixes if required