Title
Hadoop Developer
Quick Summary
CedarForge Data Platforms is hiring a Hadoop Developer to modernize and operate distributed data systems that power analytics and machine learning. The role focuses on building reliable HDFS, Hive, and Spark workloads, hardening ingestion and transformation pipelines, and guiding migrations from on-premises clusters to cloud lakehouse architectures. We welcome strong graduates and early-career engineers who have shipped real projects and care about reliability, cost, and performance.
Project Category or Industry
Enterprise data engineering for analytics and digital products.
Type
Full-time employment.
Experience Level
Entry to mid-level with mentorship and clear growth paths; experienced applicants are also encouraged.
Duration
Permanent role.
Location
Remote-first with optional hybrid collaboration in Raleigh, NC and Frankfurt, DE. Maintain at least four hours of overlap with teams operating between UTCβ5 and UTC+2.
Salary
USD 92,000β136,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll where supported; in select countries, a compliant contractor arrangement can be used.
Hiring Company Name
CedarForge Data Platforms
Required Skills or Tools
Solid SQL and Python or Scala; practical experience with Hadoop ecosystem components including HDFS, Hive, Spark, and YARN; familiarity with table formats and metastore management; understanding of data modeling, partitioning, and query optimization; comfort with CI/CD, observability, and cloud data services.
Project Description
CedarForge Data Platforms delivers end-to-end data platforms for enterprise customers. As a Hadoop Developer, you will turn business requirements into resilient data services that scale. You will design schemas and table layouts, build ingestion and transformation jobs, and improve reliability and cost through thoughtful partitioning, compaction, and caching strategies. You will also help plan and execute migrations from legacy Hadoop clusters to cloud-native lakehouse stacks while keeping service-level objectives intact.
Core Responsibilities and Expected Deliverables
Design and operate batch and near-real-time pipelines on Hadoop with clear SLAs for freshness, completeness, and latency.
Implement Hive/Spark transformations, optimize file sizing and partitioning, and tune jobs for predictable performance.
Maintain metastore catalogs, security policies, and data governance; enforce data contracts and schema evolution.
Support cluster operations and capacity planning, including YARN queue configuration, resource tuning, and rolling upgrades.
Contribute to cloud migration plans toward Delta Lake/Iceberg/Hudi with minimal downtime and validated parity.
Ship reproducible code, automated tests, deployment manifests, runbooks, and concise documentation for cross-functional users.
Required Experience and Preferred Qualifications
Proficiency in SQL plus Python or Scala, with software engineering discipline (version control, testing, CI/CD).
Hands-on experience with Hadoop ecosystem components: HDFS, Hive, Spark, and YARN; familiarity with MapReduce is helpful.
Working knowledge of warehouses such as Snowflake, BigQuery, or Redshift and object storage on AWS or GCP for hybrid architectures.
Preferred: experience with Kafka or Kinesis, Oozie or Airflow/Dagster, Delta Lake/Iceberg/Hudi, Ranger/Atlas, Terraform for infrastructure as code, and Kubernetes for auxiliary services.
Evidence of impact via internships, open-source contributions, or shipped pipelines will be valued.
Tools or Platforms to Be Used
Core stack: Hadoop (HDFS, YARN), Hive, Spark SQL/Structured Streaming.
Orchestration and transforms: Airflow or Dagster; dbt where appropriate.
Messaging and streaming: Kafka with Schema Registry.
Storage and warehousing: HDFS; S3 or GCS for cloud; Snowflake/BigQuery/Redshift for analytics.
Governance and observability: Apache Ranger and Atlas, Great Expectations, OpenLineage/Marquez, Prometheus, Grafana.
Infrastructure and CI/CD: Terraform, Docker, GitHub Actions.
Language Requirement
Professional English is required. Additional languages are welcome for cross-regional collaboration.
Communication Style
Written-first culture using design docs and pull requests on GitHub; Slack for daily coordination; Zoom for stand-ups, design reviews, and incident retrospectives. Clear, actionable documentation is expected for all changes.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block overlapping at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Salary is paid monthly via payroll for employees. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Portfolio and code samples demonstrating Hadoop/Spark proficiency, performance tuning, and operational discipline.
Practical exercise focused on building an incremental Hive/Spark pipeline with validation and lineage.
Technical interview on partitioning, compaction, resource management, and migration strategy.
Final conversation on collaboration, product sense, and communication.
References may be requested.
Other Requirements
New hires sign a confidentiality agreement and follow security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-site support for data center or cloud migration milestones may be required.
About CedarForge Data Platforms
CedarForge Data Platforms is a privately held data engineering company that designs and operates modern analytics platforms for clients in commerce, manufacturing, and financial services. Headquartered in Raleigh with a distributed team across North America and Europe, we pair rigorous engineering with practical operations to deliver reliable, cost-effective data systems. Learn more at https://www.cedarforge.io and reach our hiring team at careers@cedarforge.io.
