Title
LLM Engineer
Quick Summary
Aurora Semantics is hiring an LLM Engineer to design, fine-tune, and productionize large language model capabilities that power search, chat, summarization, and workflow automation across our SaaS platform. You will partner with product, data, and platform teams to turn prototypes into reliable features with measurable impact. We welcome strong graduates and early-career engineers with solid projects, while offering clear growth paths and senior mentorship.
Project Category or Industry
Artificial intelligence for SaaS, information retrieval, and enterprise productivity.
Type
Full-time employment.
Experience Level
Entry to mid-level. Freshers with compelling portfolios and internships are encouraged to apply; experienced candidates are also welcome.
Duration
Permanent role.
Location
Remote-first with optional hubs in Amsterdam and Singapore. Maintain at least four hours of overlap with teams operating between UTC and UTC+8.
Salary
USD 92,000β135,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll where supported; compliant contractor arrangements available in select countries.
Hiring Company Name
Aurora Semantics
Required Skills or Tools
Strong Python, prompt engineering fundamentals, and a working knowledge of modern LLM stacks. Comfort with retrieval-augmented generation, evaluation frameworks, and safe deployment patterns is important. Familiarity with cloud services, vector databases, and API development will help you be productive quickly.
Project Description
Aurora Semantics builds language-centric features that help businesses search, reason over, and act on their unstructured data. As an LLM Engineer, you will help scope opportunities, prototype rapidly, and mature solutions into observable, reliable product capabilities. The work spans data preparation, prompt and model design, evaluation and red-teaming, and close collaboration with MLOps and platform engineering to ship features safely.
Core Responsibilities and Expected Deliverables
Design prompts, adapters, and fine-tuning strategies for ranking, summarization, extraction, and conversational flows.
Build retrieval-augmented pipelines including chunking, embeddings, and query rewriting; define success metrics and guardrails.
Implement offline and online evaluation harnesses with automatic regressions, bias checks, and safety filters.
Package services as well-tested APIs; instrument latency, quality, and safety dashboards with clear SLOs and alerts.
Produce reproducible training code, experiment reports, runbooks, and concise developer documentation.
Required Experience and Preferred Qualifications
Proficiency in Python and software engineering best practices, including testing and code review.
Hands-on experience with at least one of the following: Hugging Face Transformers, OpenAI-compatible APIs, or vLLM/LLM inference servers.
Working knowledge of embeddings, vector search, and ranking techniques; SQL comfort for analysis.
Preferred: experience with distributed inference, quantization, or fine-tuning (LoRA/QLoRA), plus prior work on evaluation datasets and prompt libraries.
Coursework, internships, publications, or open-source contributions demonstrating practical impact will be valued.
Tools or Platforms to Be Used
Modeling and experimentation: Python, PyTorch, Hugging Face Transformers, MLflow or Weights & Biases.
Retrieval and storage: FAISS or pgvector, Elasticsearch or OpenSearch, Postgres or BigQuery.
Services and infrastructure: FastAPI, Docker, Kubernetes, GitHub Actions, AWS or GCP, Terraform in collaboration with platform teams.
Observability and safety: Prometheus, Grafana, OpenTelemetry-compatible logging, and policy engines for content moderation.
Language Requirement
Professional English is required. Additional languages are helpful for cross-regional collaboration.
Communication Style
Written-first culture using GitHub issues and pull requests for design and reviews, Slack for daily collaboration, and Zoom for stand-ups, demos, and incident reviews.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block that overlaps at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Monthly payroll for employees. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Applications are assessed on portfolio quality, clarity of reasoning, and measurable outcomes. The process includes an initial screening, a practical exercise focused on retrieval-augmented generation and evaluation, a systems interview on safe deployment and observability, and a final conversation on collaboration and product sense. References may be requested.
Other Requirements
New hires sign a confidentiality agreement and adhere to security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional shared on-call for language services may be required.
About Aurora Semantics
Aurora Semantics is a privately held AI product company focused on language understanding and decision support for knowledge-heavy teams in finance, healthcare, and professional services. Headquartered in Amsterdam with a distributed team across EMEA and APAC, we combine rigorous engineering with applied research to build trustworthy, efficient language systems. Learn more at https://www.aurorasemantics.com and contact our hiring team at careers@aurorasemantics.com.
