Title
Big Data Engineer
Quick Summary
GraniteStream Analytics is hiring a Big Data Engineer to design and operate petabyte-scale data processing across batch and streaming workloads. You will build high-throughput pipelines with Spark, Kafka, and Flink, manage lakehouse storage, and enable reliable analytics and machine learning. The ideal candidate is pragmatic, comfortable with distributed systems, and motivated by measurable improvements to reliability, latency, and cost.
Project Category or Industry
Data engineering and large-scale analytics for SaaS, commerce, and media platforms.
Type
Full-time employment.
Experience Level
Entry to mid-level, with mentorship for strong graduates; experienced applicants are also encouraged.
Duration
Permanent role.
Location
Remote-first with optional hybrid collaboration in Seattle and Warsaw. Maintain at least four hours of overlap with teams operating between UTCβ8 and UTC+2.
Salary
USD 100,000β148,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll where supported; compliant contractor arrangements available in select countries.
Hiring Company Name
GraniteStream Analytics
Required Skills or Tools
Proficiency in Python or Scala and strong SQL; hands-on experience with Spark and either Kafka or Flink; familiarity with lakehouse formats such as Delta Lake, Iceberg, or Hudi; understanding of data modeling, partitioning, and performance tuning; comfort with cloud platforms (AWS or GCP), infrastructure as code, and production observability.
Project Description
GraniteStream Analytics builds data platforms that power real-time decision-making and large-scale analytics. As a Big Data Engineer, you will own the end-to-end lifecycle of distributed pipelines: ingestion, transformation, storage, and serving. You will design schemas and table layouts that balance freshness and cost, enforce data contracts and governance, and provide self-service access for downstream users in analytics, experimentation, and machine learning.
Core Responsibilities and Expected Deliverables
Design, implement, and operate batch and streaming pipelines with clear SLAs for freshness, quality, and throughput.
Build scalable retrieval and transformation jobs in Spark, optimize joins and shuffles, and manage checkpointing and backfills.
Configure and maintain Kafka topics, schemas, and retention; develop Flink or Spark Structured Streaming jobs for low-latency processing.
Model lakehouse tables with sensible partitioning, clustering, and compaction; manage table evolution and versioning.
Establish data quality validation, lineage, and governance; surface dashboards and alerts for pipeline health and cost.
Deliver reproducible code, tests, deployment manifests, and concise documentation; participate in on-call rotations with runbooks.
Required Experience and Preferred Qualifications
Solid programming in Python or Scala with engineering discipline (testing, reviews, CI/CD).
Proven experience with Spark (core/SQL/Structured Streaming) and at least one streaming stack (Kafka, Flink, or Kinesis).
Working knowledge of warehouses such as Snowflake, BigQuery, or Redshift and storage on S3 or GCS.
Preferred: dbt for transformations, Delta Lake/Iceberg/Hudi operations, Feast or similar feature store, Terraform, and Kubernetes.
Familiarity with cost governance, access controls, and compliance frameworks is a plus.
Evidence of impact through internships, open-source work, or shipped pipelines will be valued.
Tools or Platforms to Be Used
Processing: Apache Spark, Apache Flink where required.
Streaming and messaging: Kafka with Schema Registry.
Storage and warehousing: S3 or GCS; Snowflake, BigQuery, or Redshift.
Orchestration and transformations: Airflow or Dagster; dbt.
Observability and quality: Prometheus, Grafana, Great Expectations, OpenLineage/Marquez.
Infrastructure: Terraform, Docker, Kubernetes; CI/CD via GitHub Actions.
Language Requirement
Professional English is required. Additional languages are helpful for cross-regional collaboration.
Communication Style
Written-first culture using design docs and pull requests on GitHub; Slack for daily coordination; Zoom for stand-ups, design reviews, and incident retrospectives. Clear documentation is expected for all changes.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block that overlaps at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Salary is paid monthly via payroll. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Portfolio and code samples demonstrating distributed data processing, performance tuning, and operational discipline.
Practical exercise building a scalable pipeline with incremental loads, data validation, and lineage.
Technical interview covering partitioning, compaction, streaming semantics, and cost optimization.
Final conversation on collaboration, product sense, and communication.
References may be requested.
Other Requirements
New hires sign a confidentiality agreement and follow security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-site support for data migrations or capacity planning workshops may be required.
About GraniteStream Analytics
GraniteStream Analytics is a privately held data engineering company that designs and operates large-scale analytics platforms for clients in commerce, media, and SaaS. Headquartered in Seattle with a distributed team across North America and Europe, we pair rigorous engineering with pragmatic operations to deliver trustworthy, cost-effective data systems. Learn more at https://www.granitestream.io and reach our hiring team at careers@granitestream.io.
