Job Description
Responsibilities
- Major Fault Support & Review – Respond to Severity 1/2 incidents within 1 hour, log cases to Huawei TAC, lead troubleshooting, and conduct post event fault reviews.
- Operational Analysis – Deliver monthly reports covering cloud resource capacity, platform health, alarm analysis, and optimization recommendations.
- Risk Check – Perform quarterly in depth health checks (runtime status, capacity, configuration, warnings, version & license management) and provide rectification guidance.
- Basic & Advanced Cloud Service Version Upgrade – Lead upgrade solution design, pre upgrade checks, implementation, verification, and rollback for cloud platform and gPaaS/AI DaaS services
- Issue & Risk Management – Analyse issue trends, track unresolved escalated issues, and proactively identify and mitigate platform stability risks.
- Resource & Capacity Management – Analyse resource usage and provide scaling or reconstruction recommendations.
- Change Implementation – Implement configuration changes (including after hours support up to 8 hours/week as needed), obtain customer authorization, and perform rollback if required.
- Urgent Recovery & Troubleshooting – Assist TAC in emergency fault recovery, common issue resolution, and rapid service restoration.
- Routine PMI & Monitoring – Perform routine product inspections, monitor alarms via ManageOne/eSight, and ensure platform health
Job Requirements:
- Bachelor/Master of computer science engineering or related majors, with over 3 years O&M experience, including hands on work with cloud platforms (public/private cloud) or general IT (networks, OS, databases, middleware, basic IT components).
- Familiar with HCS deployment, tenant O&M processes (backup management, resource inspection, requirement management, risk management, asset management, expense analysis, monitoring & alarming.
- Strong foundational knowledge of datacom principles; familiar with TCP/IP, standard IP networking, and hybrid cloud networking; able to independently resolve basic datacom issues; proficient in physical network technologies and architectures.
- Strong capability to identify and demarcate cloud problems, lead problem closure, and drive backend improvements.
- Excellent customer service awareness and communication skills; able to work with multinational teams
- Attention to detail and strong execution capabilities
Preferred Qualifications:
- Experience maintaining or optimising large scale data centre network services.
- Experience with SDN delivery and maintenance.
- Familiarity with chaos engineering, fault drills, stress tests, and architecture optimisation.
- Experience in multinational team management (at least 2 years).
Apply Link: https://jobs.persolmalaysia.com/detail/39051?apply_id=39051&utm_source=jobstore&utm_medium=sponsored
Contact Email: jiahui.ang@persolapac.com
Perks & Benefits
- Personal leave
- Open culture
- Personal development opportunities
Job Location
Kuala Lumpur
Click to view the location on Google maps