Title
Data Quality Engineer
Quick Summary
Northstar Data Integrity is hiring a Data Quality Engineer to ensure that analytics and machine learning teams can rely on clean, timely, and well-governed data. The role focuses on building automated validation, anomaly detection, lineage, and incident response, while partnering with platform and analytics stakeholders to define clear data contracts and service levels. We welcome strong graduates and early-career professionals who enjoy turning ambiguous data issues into measurable, automated safeguards.
Project Category or Industry
Data infrastructure and governance for SaaS, commerce, and financial services.
Type
Full-time employment.
Experience Level
Entry to mid-level. Freshers with meaningful projects or internships are encouraged to apply; experienced candidates are also welcome.
Duration
Permanent role.
Location
Remote-first with optional hybrid hubs in New York and Prague. Maintain at least four hours of collaboration overlap between UTCβ5 and UTC+2.
Salary
USD 90,000β130,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll for employees; compliant contractor arrangements are available in select countries.
Hiring Company Name
Northstar Data Integrity
Required Skills or Tools
Strong SQL and Python, familiarity with data modeling and warehouse concepts, and practical experience with validation frameworks, lineage, and observability. Comfort with transformation tooling and orchestration, plus clear written communication to define standards and drive adoption.
Project Description
Northstar Data Integrity builds the trust layer for data-driven organizations. As a Data Quality Engineer, you will design and implement the guardrails that keep pipelines healthy and outputs reliable. You will define data contracts with upstream teams, implement validation at multiple stages, and establish monitoring that detects regressions before they reach dashboards, experiments, or model features.
Core Responsibilities and Expected Deliverables
Define and maintain data contracts that specify schema, distributions, and freshness for critical datasets.
Build automated checks for validity, completeness, uniqueness, referential integrity, and business rules across batch and streaming pipelines.
Implement anomaly detection for volume, schema drift, and metric behavior; tune alerting to reduce noise and speed up response.
Establish lineage and documentation standards; surface data catalog views that help users understand dependencies and change impact.
Partner with platform engineers to integrate quality gates into CI and deployment flows; publish runbooks and incident postmortems.
Report on service levels for data quality and freshness, and drive continuous improvement initiatives.
Required Experience and Preferred Qualifications
Proficiency in SQL and Python with sound engineering practices such as testing, code review, and CI.
Hands-on experience with at least one warehouse (Snowflake, BigQuery, or Redshift) and familiarity with lakehouse tables (Delta Lake, Iceberg, or Hudi).
Working knowledge of transformation tools such as dbt and orchestration like Airflow or Dagster.
Preferred: Great Expectations or similar validation frameworks, OpenLineage or Marquez for lineage, DataHub or Collibra for cataloging, and basic statistics for anomaly detection and A/B guardrails.
Evidence of impact through internships, open-source contributions, coursework, or shipped validation frameworks will be valued.
Tools or Platforms to Be Used
Validation and testing: Great Expectations or Soda, dbt tests.
Lineage and catalog: OpenLineage or Marquez, DataHub.
Processing and storage: Spark where needed, warehouses on Snowflake or BigQuery, object storage on S3 or GCS.
Orchestration and CI/CD: Airflow or Dagster, GitHub Actions.
Observability: Prometheus, Grafana, and OpenTelemetry-compatible logging.
Language Requirement
Professional English is required. Additional languages are welcome for cross-regional collaboration.
Communication Style
Written-first collaboration using design docs and pull requests on GitHub; Slack for daily coordination; Zoom for stand-ups, reviews, and incident retrospectives. Clear, accessible documentation is expected for all changes.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block that overlaps at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Salary is paid monthly via payroll. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Portfolio or code samples demonstrating validation, lineage, or incident response automation.
Practical exercise implementing a quality gate with tests, alerts, and documentation for a sample dataset.
Technical interview on schema design, partitioning, anomaly detection, and change management.
Final conversation on collaboration, communication, and stakeholder management.
References may be requested.
Other Requirements
New hires sign a confidentiality agreement and comply with security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-call for data quality incidents is shared across the team.
About Northstar Data Integrity
Northstar Data Integrity is a privately held data reliability firm helping product and analytics teams ship trustworthy insights at scale. Based in New York with a distributed workforce across North America and Europe, we combine rigorous engineering with practical operations to reduce data downtime and improve confidence in decision making. Learn more at https://www.northstardataintegrity.com and reach our hiring team at careers@northstardataintegrity.com.
