Role Summary: We are looking for a hands-on engineer responsible for monitoring, troubleshooting, and ensuring the availability of cloud and on-prem infrastructure across Azure, AWS, Windows, and Linux environments
The role requires proactive monitoring, incident response, and strong root cause analysis skills
Key Responsibilities
Cloud Monitoring (Azure AWS) Monitor cloud infrastructure using tools like Azure Monitor and Amazon CloudWatch Configure alerts, dashboards, and health checks Analyze metrics (CPU, memory, disk, network) and respond to anomalies Troubleshoot VM, storage, and networking issues in cloud environments Track cost-impacting anomalies (overutilization, idle resources) Identity Access Monitoring (Azure AD) Manage and monitor Microsoft Entra ID (Azure AD) Investigate login failures, risky sign-ins, and MFA issues Support conditional access policies and identity security Handle user access issues, lockouts, and permissions troubleshooting Server Monitoring (Windows Linux): Monitor health and performance of: Windows Server environments Linux systems Troubleshoot: High CPU/memory usage Disk space issues Service failures Perform log analysis and root cause identification Incident Management Troubleshooting: Respond to alerts and incidents within SLA timelines Perform root cause analysis (RCA) and document findings Coordinate with application, network, and security teams Maintain incident reports and resolution documentation Proactive Monitoring Optimization: Identify recurring issues and implement preventive fixes Fine-tune monitoring alerts to reduce noise Automate routine checks using scripts (PowerShell/Bash)
Required Skills Experience:
Core Technical Skills
Hands-on experience with: Microsoft Azure Amazon Web Services Strong knowledge of: VM troubleshooting (boot issues, performance, connectivity) Storage (disks, IOPS, latency issues) Networking basics (DNS, routing, firewall concepts) Monitoring Tools: Experience with: Azure Monitor / Log Analytics Amazon CloudWatch Understanding of alerts, metrics, logs, and dashboards Operating Systems: Strong troubleshooting in: Windows Server Linux Scripting Automation: Basic scripting skills: PowerShell (Windows/Azure) Bash (Linux) Preferred Qualifications Certifications like: AZ-104 AWS Certified SysOps Administrator Experience with ticketing tools (ServiceNow, Jira) Exposure to backup, DR, and patch management Soft Skills: Strong troubleshooting mindset (not just monitoring) Ability to work under pressure during incidents Clear communication and documentation skills
Disclaimer : This job posting has been aggregated from external source. Role details, content, and availability are subject to change. Applicants are advised to confirm the latest information directly on the company website before applying.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: IT & Information SecurityRole Category: IT Infrastructure ServicesRole: IT Infrastructure Services - OtherEmployement Type: Full time