We are seeking a highly skilled Databricks Data Engineer to design, build, and optimize scalable data pipelines and lakehouse architectures. The ideal candidate will have strong expertise in Apache Spark, PySpark, Delta Lake, and modern cloud-based data platforms, with the ability to transform complex business requirements into reliable and performant data solutions.
Location : All Brillio Location:
Experience:4 to 8yrs
🔧 Key Responsibilities
1. Data Engineering & Pipeline Development
Design and develop scalable ETL/ELT pipelines using PySpark and Spark SQL
Build and maintain Databricks workflows (Jobs) for orchestration
Implement Delta Live Tables (DLT) for declarative pipeline development
Develop and manage batch and streaming data pipelines
2. Lakehouse Architecture Implementation
Design and implement Medallion Architecture (Bronze, Silver, Gold layers)
Build curated datasets for analytics and reporting
Optimize storage using Delta Lake best practices
3. Data Ingestion & Integration
Ingest data from multiple sources:
Databases (RDBMS, NoSQL)
APIs and streaming platforms (Kafka, Event Hubs)
Files (CSV, JSON, Parquet)
Handle structured and semi-structured data efficiently
4. Delta Lake & Performance Optimization
Implement:
ACID transactions
Schema enforcement and evolution
Change Data Capture (CDC)
Optimize Spark jobs using:
Partitioning strategies
Caching and broadcast joins
File compaction and indexing (Z-ORDER)
5. Data Quality & Governance
Implement data validation and quality checks
Ensure compliance with data governance standards
Use Unity Catalog for access control and data lineage
Maintain auditability and data traceability
6. Monitoring & Reliability
Build logging, monitoring, and alerting for pipelines
Troubleshoot failures and optimize performance
Ensure high availability and fault tolerance
7. Collaboration & Delivery
Work closely with Data Analysts, Data Scientists, and stakeholders
Translate business requirements into data models and pipelines
Participate in Agile ceremonies (Sprint planning, stand-ups, retrospectives)
✅ Required Qualifications
Bachelor’s degree in Computer Science, Engineering, or related field
4–8+ years of experience in Data Engineering
Strong hands-on experience with:
Databricks platform
PySpark and Spark SQL
Delta Lake
Experience with:
ETL/ELT pipeline development
Distributed data processing
Solid understanding of:
Data modeling (Star/Snowflake schemas)
Data warehousing concepts
⭐ Preferred Qualifications
Experience with Delta Live Tables (DLT)
Knowledge of CI/CD pipelines (Azure DevOps, GitHub Actions)
Experience with streaming frameworks (Kafka, Spark Streaming)
Familiarity with cloud platforms:
Azure / AWS / GCP
Experience with MLflow and MLOps workflows
Domain experience (e.g., Healthcare, Finance, Retail)
🛠️ Technical Skills
Languages: Python, SQL
Frameworks: Apache Spark
Tools: Databricks, Delta Lake, MLflow
Data Formats: Parquet, Delta, JSON, Avro
Orchestration: Databricks Workflows / Airflow
Version Control: Git
💡 Soft Skills
Strong problem-solving and analytical thinking
Excellent communication skills
Ability to work in a collaborative environment
Attention to detail and data quality
🚀 Nice-to-Have (Optional Add-ons)
Experience with real-time analytics
Exposure to data governance tools
Certification in Databricks or Cloud platforms
Together, we create the future you always aspired to. Explore your next career opportunity.
SEE ALL OPEN POSITIONS