Job Description
Role & responsibilities
Skill : Big Data Engineer (Senior Cloud system Engineer)
EXP: 6+ years of relevant experience to Google BQ, Google cloud hyper scalar
- Participate in 24x7x365 SAP Environment rotational shift support and operations
- As a team lead you will be responsible for maintaining the upstream Big Data environment day in day out where millions of financial data flowing through, consists of PySpark, Big Query, Dataproc and Google Airflow
- You will be responsible for streamlining and tuning existing Big Data systems and pipelines and building new ones. Making sure the systems run efficiently and with minimal cost is a top priority
- Manage the operations team in your respective shift. You will be making changes to the underlying systems
- This role involves providing day-to-day support, enhancing platform functionality through DevOps practices, and collaborating with application development teams to optimize database operations.
- Architect and optimize data warehouse solutions using BigQuery to ensure efficient data storage and retrieval.
- Install/build/patch/upgrade/configure big data applications
- Manage and configure BigQuery environments, datasets, and tables.
- Ensure data integrity, accessibility, and security in the BigQuery platform.
- Implement and manage partitioning and clustering for efficient data querying.
- Define and enforce access policies for BigQuery datasets.
- Implement query usage caps and alerts to avoid unexpected expenses.
- Should be very comfortable with troubleshooting Linux-based systems on issues and failures with good grasp of the Linux command line
- Create and maintain dashboards and reports to track key metrics like cost, performance.
- Integrate BigQuery with other Google Cloud Platform (GCP) services like Dataflow, Pub/Sub, and Cloud Storage.
- Enable BigQuery through tools like Jupyter notebook, Visual Studio code, other CLI's
- Implement data quality checks and data validation processes to ensure data integrity.
- Manage and monitor data pipelines using Airflow and CI/CD tools (e.g., Jenkins, Screwdriver) for automation.
- Collaborate with data analysts and data scientists to understand data requirements and translate them into technical solutions.
- Provide consultation and support to application development teams for database design, implementation, and monitoring.
- Proficiency in Unix/Linux OS fundamentals, shell/perl/python scripting, and Ansible for automation.
- Disaster Recovery & High Availability
- Expertise in planning and coordinating disaster recovery principles, including backup/restore operations
- Experience with geo-redundant databases and Red Hat cluster
- Accountable for ensuring that delivery is within the defined SLA and agreed milestones (projects) by following best practices and processes for continuous service improvement.
- Work closely with other Support Organizations (DB, Google, PySpark data engineering and Infrastructure teams)
- Incident Management, Change Management, Release Management and Problem Management processes based on ITIL Framework.
Good/Must to Have Skill
Google BQ: Have understanding to create BQ, services design, data warehousing , etl pipelines, end to end set up of environment of google cloud eco system
Google cloud developer +Pyspark
Exposure to all services of GCP
Source of data flow is From SAP data
Very strong in GCP development
Linux is preferred platform
Any of the public cloud exposure preferable BIG Query
Work Location :Chennai and HYD
Rounds of interview : 1 internal and 1 client (To be confirmed)
Mode of interview (Virtual/ In-person): Virtual
Work timing :11 AM to 8 PM
Work Mode (Remote/ On-site/ Hybrid) :Hybrid
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Big Data Engineer
Employement Type: Full time
Contact Details:
Company: Internal
Location(s): Hyderabad
Keyskills:
Google BigQuery
GCP
PySpark
Big Data Engineer