l o a d i n g

Data Engineer Needed for ETL Pipeline and Data Warehouse Development

Oct 29, 2025 - Junior

$14,000.00 Fixed

We're seeking an experienced Data Engineer to build robust ETL pipelines, design a scalable data warehouse, and create data infrastructure that enables our analytics and business intelligence initiatives.

Project Overview:

Develop end-to-end data pipelines to extract data from multiple sources, transform and clean it, and load it into a centralized data warehouse. Implement data quality checks, automation, and monitoring for reliable data delivery.

Key Responsibilities:


Design and implement scalable ETL/ELT pipelines

Build and optimize data warehouse architecture (star/snowflake schema)

Extract data from various sources (APIs, databases, files, streaming)

Transform and clean data using Python and SQL

Implement data quality validation and monitoring

Create automated workflows using Apache Airflow or similar tools

Optimize query performance and database indexing

Set up data governance and documentation

Build real-time data streaming pipelines

Create data models for analytics and reporting


Required Skills:


3+ years of data engineering experience

Strong proficiency in SQL (complex queries, optimization, stored procedures)

Python programming for data processing (Pandas, NumPy)

Experience with ETL tools (Apache Airflow, Luigi, Prefect)

Data warehouse experience (AWS Redshift, Snowflake, BigQuery, Azure Synapse)

Experience with data modeling (dimensional modeling, normalization)

Knowledge of distributed computing (Apache Spark, Hadoop)

Cloud platforms (AWS, GCP, or Azure data services)

Version control with Git and CI/CD practices

Data quality and testing frameworks


Technical Stack:


Languages: Python, SQL

ETL Orchestration: Apache Airflow or Dagster

Data Warehouse: AWS Redshift, Snowflake, or Google BigQuery

Databases: PostgreSQL, MySQL, MongoDB

Big Data: Apache Spark, Apache Kafka (streaming)

Cloud Services: AWS (S3, Lambda, Glue, EMR) or GCP/Azure equivalents

BI Tools: Tableau, Power BI, Looker (integration)

Version Control: Git


Data Sources:


REST APIs and webhooks

Relational databases (PostgreSQL, MySQL, SQL Server)

NoSQL databases (MongoDB, DynamoDB)

Cloud storage (S3, Google Cloud Storage)

CSV/Excel files and structured data

Real-time streaming data (Kafka, Kinesis)


Key Features:


Incremental data loading strategies

Data deduplication and validation

Error handling and retry mechanisms

Data lineage and metadata tracking

Automated scheduling and monitoring

Data quality checks and alerts

Historical data versioning (SCD Type 2)

Performance optimization and partitioning


Deliverables:


Fully functional ETL/ELT pipelines with documentation

Optimized data warehouse schema design

Data quality validation framework

Automated workflow orchestration (Airflow DAGs)

Performance tuning and optimization report

Data dictionary and documentation

Monitoring dashboards for pipeline health

Unit tests and integration tests

Deployment scripts and infrastructure as code


Budget: $45 - $90/hour (Hourly) or $7,000 - $14,000 (Fixed project)

Timeline: 6-10 weeks

  • Proposal: 0
  • Less than 3 month
AuthorImg
Thomas Summers Inactive
,
Member since
Oct 29, 2025
Total Job
4