Title
Data Platform Engineer
Quick Summary
Riverstone Data Works is hiring a Data Platform Engineer to design, build, and operate the data foundations that power analytics, machine learning, and product features. You will own reliable pipelines, a governed lakehouse, and a feature store that supports both batch and streaming use cases. We welcome strong graduates and early-career engineers with solid projects in data engineering and cloud automation.
Project Category or Industry
Data infrastructure for SaaS, analytics, and digital products.
Type
Full-time employment.
Experience Level
Entry to mid-level, with mentorship and clear growth pathways for freshers; experienced applicants are also encouraged to apply.
Duration
Permanent role.
Location
Remote-first with optional hybrid collaboration hubs in Chicago and Lisbon. Maintain at least four hours of overlap with teams operating between UTCβ6 and UTC+2.
Salary
USD 96,000β142,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll where supported; compliant contractor arrangements are available in select countries.
Hiring Company Name
Riverstone Data Works
Required Skills or Tools
Proficiency in Python or Scala, strong SQL, and comfort with distributed processing. Familiarity with Spark and Kafka, orchestration with Airflow or Dagster, and cloud platforms is essential. Experience with data modeling, governance, and observability will help you succeed quickly.
Project Description
Riverstone Data Works delivers data platforms for product and analytics teams. As a Data Platform Engineer, you will help define the technical standards for ingestion, transformation, storage, and access. You will design schemas and contracts, implement resilient pipelines, and provide self-service tooling for analysts and engineers. Your work will create trustworthy, timely datasets and features that enable experimentation and decision making.
Core Responsibilities and Expected Deliverables
Build and maintain batch and streaming pipelines with clear SLAs for freshness, quality, and cost.
Design lakehouse tables and data models optimized for reliability, performance, and downstream consumption.
Implement a feature store for machine learning, ensuring online/offline consistency and versioning.
Establish data contracts, validation, lineage, and governance; automate schema evolution and backfills.
Create dashboards and alerts for pipeline health, data quality, and platform capacity; deliver runbooks and on-call readiness.
Provide internal tooling and documentation that enable self-service access and safe experimentation.
Required Experience and Preferred Qualifications
Solid programming in Python or Scala, strong SQL, and software engineering discipline (testing, reviews, CI/CD).
Hands-on experience with Spark or Flink, Kafka or Kinesis, and orchestration such as Airflow or Dagster.
Working knowledge of lakehouse technologies (Delta Lake, Iceberg, or Hudi) and warehouse platforms (Snowflake, BigQuery, or Redshift).
Preferred: dbt for transformations, Feast or similar for feature serving, Terraform for infrastructure as code, and Kubernetes for runtime management.
Exposure to cost governance, access controls, and compliance frameworks is a plus.
Evidence of impact through internships, open-source work, or shipped pipelines will be valued.
Tools or Platforms to Be Used
Data processing: Apache Spark, Delta Lake/Iceberg/Hudi, Flink where streaming is required.
Orchestration and transformations: Airflow or Dagster, dbt.
Messaging and streaming: Kafka or Kinesis.
Storage and warehousing: S3 or GCS, Snowflake or BigQuery or Redshift.
Observability and quality: Great Expectations, OpenLineage/Marquez, Prometheus, Grafana.
Infrastructure: Terraform, Docker, Kubernetes, GitHub Actions; primary cloud on AWS or GCP.
Language Requirement
Professional English is required. Additional languages are helpful for cross-regional collaboration.
Communication Style
Written-first collaboration using design docs and pull requests on GitHub; Slack for daily coordination; Zoom for stand-ups, reviews, and incident retrospectives. Clear, concise documentation is expected for all changes.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block that overlaps at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Salary is paid monthly via payroll for employees. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Portfolio and code samples demonstrating pipeline reliability, data modeling clarity, and operational discipline.
Practical exercise building a small lakehouse pipeline with data validation, lineage, and an incremental load strategy.
Technical interview covering partitioning, compaction, streaming semantics, cost governance, and observability.
Final conversation on collaboration, product sense, and communication.
References may be requested.
Other Requirements
New hires sign a confidentiality agreement and comply with security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-call for platform incidents is shared across the team.
About Riverstone Data Works
Riverstone Data Works is a privately held data engineering company that builds platforms for analytics and product teams in commerce, media, and SaaS. Headquartered in Chicago with a distributed workforce across North America and Europe, we pair rigorous engineering with practical operations to deliver reliable, cost-efficient data systems. Learn more at https://www.riverstonedata.io and contact the hiring team at careers@riverstonedata.io.
