📌 Hiring Update: Hadoop Developer – CedarForge Data Platforms

Title
Hadoop Developer

Quick Summary
CedarForge Data Platforms is hiring a Hadoop Developer to modernize and operate distributed data systems that power analytics and machine learning. The role focuses on building reliable HDFS, Hive, and Spark workloads, hardening ingestion and transformation pipelines, and guiding migrations from on-premises clusters to cloud lakehouse architectures. We welcome strong graduates and early-career engineers who have shipped real projects and care about reliability, cost, and performance.

Project Category or Industry
Enterprise data engineering for analytics and digital products.

Type
Full-time employment.

Experience Level
Entry to mid-level with mentorship and clear growth paths; experienced applicants are also encouraged.

Duration
Permanent role.

Location
Remote-first with optional hybrid collaboration in Raleigh, NC and Frankfurt, DE. Maintain at least four hours of overlap with teams operating between UTC−5 and UTC+2.

Salary
USD 92,000–136,000 base depending on location and experience, plus benefits and an annual performance bonus.

Payment Mode
Monthly payroll where supported; in select countries, a compliant contractor arrangement can be used.

Hiring Company Name
CedarForge Data Platforms

Required Skills or Tools
Solid SQL and Python or Scala; practical experience with Hadoop ecosystem components including HDFS, Hive, Spark, and YARN; familiarity with table formats and metastore management; understanding of data modeling, partitioning, and query optimization; comfort with CI/CD, observability, and cloud data services.

Project Description

CedarForge Data Platforms delivers end-to-end data platforms for enterprise customers. As a Hadoop Developer, you will turn business requirements into resilient data services that scale. You will design schemas and table layouts, build ingestion and transformation jobs, and improve reliability and cost through thoughtful partitioning, compaction, and caching strategies. You will also help plan and execute migrations from legacy Hadoop clusters to cloud-native lakehouse stacks while keeping service-level objectives intact.

Core Responsibilities and Expected Deliverables

Design and operate batch and near-real-time pipelines on Hadoop with clear SLAs for freshness, completeness, and latency.
Implement Hive/Spark transformations, optimize file sizing and partitioning, and tune jobs for predictable performance.
Maintain metastore catalogs, security policies, and data governance; enforce data contracts and schema evolution.
Support cluster operations and capacity planning, including YARN queue configuration, resource tuning, and rolling upgrades.
Contribute to cloud migration plans toward Delta Lake/Iceberg/Hudi with minimal downtime and validated parity.
Ship reproducible code, automated tests, deployment manifests, runbooks, and concise documentation for cross-functional users.

Required Experience and Preferred Qualifications

Proficiency in SQL plus Python or Scala, with software engineering discipline (version control, testing, CI/CD).
Hands-on experience with Hadoop ecosystem components: HDFS, Hive, Spark, and YARN; familiarity with MapReduce is helpful.
Working knowledge of warehouses such as Snowflake, BigQuery, or Redshift and object storage on AWS or GCP for hybrid architectures.
Preferred: experience with Kafka or Kinesis, Oozie or Airflow/Dagster, Delta Lake/Iceberg/Hudi, Ranger/Atlas, Terraform for infrastructure as code, and Kubernetes for auxiliary services.
Evidence of impact via internships, open-source contributions, or shipped pipelines will be valued.

Tools or Platforms to Be Used

Core stack: Hadoop (HDFS, YARN), Hive, Spark SQL/Structured Streaming.
Orchestration and transforms: Airflow or Dagster; dbt where appropriate.
Messaging and streaming: Kafka with Schema Registry.
Storage and warehousing: HDFS; S3 or GCS for cloud; Snowflake/BigQuery/Redshift for analytics.
Governance and observability: Apache Ranger and Atlas, Great Expectations, OpenLineage/Marquez, Prometheus, Grafana.
Infrastructure and CI/CD: Terraform, Docker, GitHub Actions.

Language Requirement

Professional English is required. Additional languages are welcome for cross-regional collaboration.

Communication Style

Written-first culture using design docs and pull requests on GitHub; Slack for daily coordination; Zoom for stand-ups, design reviews, and incident retrospectives. Clear, actionable documentation is expected for all changes.

Time Commitment or Working Window

Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block overlapping at least four hours with the core team between 09:00 and 17:00 in your local time.

Payment Terms

Salary is paid monthly via payroll for employees. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.

Evaluation Criteria

Portfolio and code samples demonstrating Hadoop/Spark proficiency, performance tuning, and operational discipline.
Practical exercise focused on building an incremental Hive/Spark pipeline with validation and lineage.
Technical interview on partitioning, compaction, resource management, and migration strategy.
Final conversation on collaboration, product sense, and communication.
References may be requested.

Other Requirements

New hires sign a confidentiality agreement and follow security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-site support for data center or cloud migration milestones may be required.

About CedarForge Data Platforms

CedarForge Data Platforms is a privately held data engineering company that designs and operates modern analytics platforms for clients in commerce, manufacturing, and financial services. Headquartered in Raleigh with a distributed team across North America and Europe, we pair rigorous engineering with practical operations to deliver reliable, cost-effective data systems. Learn more at https://www.cedarforge.io and reach our hiring team at careers@cedarforge.io.

Post a Job or Project

✔ Requirements Before Posting:

Hadoop Developer – CedarForge Data Platforms

Project Description

Core Responsibilities and Expected Deliverables

Required Experience and Preferred Qualifications

Tools or Platforms to Be Used

Language Requirement

Communication Style

Time Commitment or Working Window

Payment Terms

Evaluation Criteria

Other Requirements

About CedarForge Data Platforms

Instant Apply

Read Before Apply

Similar Opportunities

Report

Post Your Job or Project

Ready to Hire?

Company

Features

Help

Newsletter