Title
Speech Recognition Engineer
Quick Summary
Voxsense Technologies is hiring a Speech Recognition Engineer to design, train, and productionize accurate, low-latency automatic speech recognition for customer support, voice search, and accessibility products. You will lead experiments from data pipeline through inference, partner with platform teams to ship streaming services, and measure impact with clear quality and latency metrics. We welcome strong graduates and early-career engineers with compelling audio machine learning projects and a bias for shipping.
Project Category or Industry
Voice AI for customer experience, knowledge retrieval, and accessibility technology.
Type
Full-time employment.
Experience Level
Entry to mid-level with mentorship and structured growth; experienced applicants are also encouraged.
Duration
Permanent role.
Location
Remote-first with optional collaboration hubs in Boston and Bengaluru. Maintain at least four hours of overlap with teams operating between UTCβ5 and UTC+5:30.
Salary
USD 90,000β132,000 base depending on location and experience, plus benefits and an annual performance bonus.
Payment Mode
Monthly payroll where supported; compliant contractor arrangements available in select countries.
Hiring Company Name
Voxsense Technologies
Required Skills or Tools
Proficiency in Python and at least one of C++ or CUDA; practical experience with PyTorch and torchaudio; knowledge of acoustic modeling, language modeling, decoding, and evaluation. Familiarity with streaming architectures, on-device optimization, and observability for production systems will help you move quickly.
Project Description
Voxsense Technologies builds voice AI that lets users speak naturally to get things done. As a Speech Recognition Engineer, you will turn product requirements into robust ASR services that achieve high accuracy under real-world noise and accent variation while meeting strict latency and cost targets. The work spans dataset curation, model development, decoding and language model integration, optimization for edge and cloud, and continuous evaluation in production.
Core Responsibilities and Expected Deliverables
Design and implement acoustic and end-to-end models using architectures such as CTC, transducer, and encoder-decoder with attention, along with hybrid systems where appropriate.
Build streaming pipelines with voice activity detection, chunking, and partial hypothesis stabilization to support barge-in and real-time experiences.
Integrate lexicons and language models, apply domain adaptation and contextual biasing, and ship reliable decoders with beam search and rescoring.
Optimize inference through quantization, pruning, ONNX and TensorRT engines, and GPU or Jetson deployments with predictable real-time factors.
Establish offline and online evaluation using WER or CER, latency percentiles, and robustness tests for noise, accents, and far-field conditions.
Deliver production-grade APIs, dashboards, alerts, runbooks, and concise documentation for cross-functional partners.
Required Experience and Preferred Qualifications
Strong programming skills in Python with sound software engineering practices including testing, code review, and CI.
Hands-on experience with PyTorch and torchaudio; familiarity with toolkits such as NeMo, ESPnet, or Kaldi is beneficial.
Understanding of signal processing fundamentals including feature extraction, augmentation, and room acoustics.
Working knowledge of SQL for analysis and data pipeline tooling.
Preferred: experience with diarization, punctuation and capitalization models, multilingual systems, and telephony or WebRTC integrations.
Evidence of impact via internships, open-source contributions, coursework, or shipped voice features will be valued.
Tools or Platforms to Be Used
Modeling and experimentation: PyTorch, torchaudio, Hugging Face, MLflow or Weights & Biases.
Decoding and retrieval: custom decoders with beam search, WFST where needed, vector stores for contextual biasing, Postgres or BigQuery for analytics.
Services and infrastructure: FastAPI, Docker, Kubernetes, GitHub Actions, AWS or GCP, Terraform in partnership with platform teams.
Optimization and edge: ONNX Runtime, TensorRT, CUDA, NVIDIA Jetson.
Observability: Prometheus, Grafana, OpenTelemetry-compatible logging; audio quality dashboards.
Language Requirement
Professional English is required. Additional languages are a plus for training data curation and accent coverage.
Communication Style
Written-first collaboration using GitHub issues and pull requests for design and reviews, Slack for daily coordination, and Zoom for stand-ups, demos, and incident reviews. Clear, actionable documentation is expected.
Time Commitment or Working Window
Standard 40 hours per week with flexible scheduling. Maintain a predictable daily block that overlaps at least four hours with the core team between 09:00 and 17:00 in your local time.
Payment Terms
Salary is paid monthly through payroll. For contractors, invoices are processed on net-30 terms upon acceptance of deliverables and timesheets.
Evaluation Criteria
Portfolio and code samples demonstrating ASR modeling, streaming design, and measurable quality or latency wins.
Practical exercise focused on training and evaluating a streaming ASR component with robustness tests.
Technical interview on decoding strategies, domain adaptation, and optimization for edge and cloud.
Final conversation on collaboration, product sense, and communication.
References may be requested.
Other Requirements
New hires sign a confidentiality agreement and follow security and data-handling policies. Light time-tracking may be used for distributed coordination. Occasional on-site visits for microphone array calibration or customer pilots may be required.
About Voxsense Technologies
Voxsense Technologies is a privately held voice AI company focused on speech recognition and understanding for enterprises in retail, financial services, and telecommunications. Headquartered in Boston with a distributed team across North America and Asia, we combine rigorous engineering with applied research to deliver reliable, low-latency voice experiences. Learn more at https://www.voxsense.ai and reach our hiring team at careers@voxsense.ai.
