- Implement data pipelines that meet design and are efficient, scalable, and maintainable
- Implement best practices including proper use of source control, participation in code reviews, data validation and testing
- Timely deliveries while working on projects
- Act as advisor/mentor and helps junior data engineers in their deliverables
- Undergraduate degree, preferably in engineering
- MS preferred
India – Pune /Bangalore /Chennai
Data Engineer responsible for data pipelines development, data cleansing, data analysis and visualizations
Skills & Experience
Must have skills:
- Should have experience of at least 4+ years with Data Engineering
- Strong experience of design, implementation and fine-tuning big data processing pipelines in production environment
- Experience with big tools like Hadoop, Spark, Kafka, Hive, Databricks
- Experience in programming at least one of with Python, Java, Scala, Shell Script
- Experience with relational SQL and NO SQL databases like PostgresSQL, MYSQL, Cassandra etc.
- Experience with any data visualization tool (Plotly, Tableau, Power BI, Google Data Studio, Quick sight etc.)
Good to have Skills:
- Should have Basic Knowledge of CI/CD Pipeline
- Experience in working on at least one Cloud (AWS or Azure or GCP)
- For AWS: – Experience with AWS Cloud services like EC2, S3, EMR, RDS, Athena, Glue, Lambda, EMR
- For Azure: -Experience with Azure Cloud services like Azure Blob/Data Lake GEN2, Delta Lake, Databricks, Azure SQL, Azure DevOps, Azure Data Factory, Power BI
- For GCP: – Experience with GCP Cloud services Big Query, Cloud Storage bucket, DataProc, Dataflow, Pub Sub, Cloud Function, Data Studio
- Sound familiarity in Versioning tools (Git, SVN etc.)
- Experience Mentoring students is desirable
- Knowledge of latest developments in Machine Learning, Deep Learning, Optimization in Automotive domain.
- Open minded approach to explore multiple algorithms to design optimal solution.
- History of contribution to articles/blogs/whitepapers etc. in Analytics
- History of contribution to Open Source.
Hadoop, Spark, Kafka, Hive, Databricks
Python, Java, Scala, Shell Script
relational SQL and NO SQL databases like PostgresSQL, MYSQL
Plotly, Tableau, Power BI, Google Data
Studio, Quick sight