$14,000.00 Fixed
We're seeking an experienced Data Engineer to build robust ETL pipelines, design a scalable data warehouse, and create data infrastructure that enables our analytics and business intelligence initiatives.
Project Overview:
Develop end-to-end data pipelines to extract data from multiple sources, transform and clean it, and load it into a centralized data warehouse. Implement data quality checks, automation, and monitoring for reliable data delivery.
Key Responsibilities:
Design and implement scalable ETL/ELT pipelines
Build and optimize data warehouse architecture (star/snowflake schema)
Extract data from various sources (APIs, databases, files, streaming)
Transform and clean data using Python and SQL
Implement data quality validation and monitoring
Create automated workflows using Apache Airflow or similar tools
Optimize query performance and database indexing
Set up data governance and documentation
Build real-time data streaming pipelines
Create data models for analytics and reporting
Required Skills:
3+ years of data engineering experience
Strong proficiency in SQL (complex queries, optimization, stored procedures)
Python programming for data processing (Pandas, NumPy)
Experience with ETL tools (Apache Airflow, Luigi, Prefect)
Data warehouse experience (AWS Redshift, Snowflake, BigQuery, Azure Synapse)
Experience with data modeling (dimensional modeling, normalization)
Knowledge of distributed computing (Apache Spark, Hadoop)
Cloud platforms (AWS, GCP, or Azure data services)
Version control with Git and CI/CD practices
Data quality and testing frameworks
Technical Stack:
Languages: Python, SQL
ETL Orchestration: Apache Airflow or Dagster
Data Warehouse: AWS Redshift, Snowflake, or Google BigQuery
Databases: PostgreSQL, MySQL, MongoDB
Big Data: Apache Spark, Apache Kafka (streaming)
Cloud Services: AWS (S3, Lambda, Glue, EMR) or GCP/Azure equivalents
BI Tools: Tableau, Power BI, Looker (integration)
Version Control: Git
Data Sources:
REST APIs and webhooks
Relational databases (PostgreSQL, MySQL, SQL Server)
NoSQL databases (MongoDB, DynamoDB)
Cloud storage (S3, Google Cloud Storage)
CSV/Excel files and structured data
Real-time streaming data (Kafka, Kinesis)
Key Features:
Incremental data loading strategies
Data deduplication and validation
Error handling and retry mechanisms
Data lineage and metadata tracking
Automated scheduling and monitoring
Data quality checks and alerts
Historical data versioning (SCD Type 2)
Performance optimization and partitioning
Deliverables:
Fully functional ETL/ELT pipelines with documentation
Optimized data warehouse schema design
Data quality validation framework
Automated workflow orchestration (Airflow DAGs)
Performance tuning and optimization report
Data dictionary and documentation
Monitoring dashboards for pipeline health
Unit tests and integration tests
Deployment scripts and infrastructure as code
Budget: $45 - $90/hour (Hourly) or $7,000 - $14,000 (Fixed project)
Timeline: 6-10 weeks
- Proposal: 0
- Less than 3 month