Deep Learning Scientist, Speech Synthesis

Job Cloud Inc.
yesterday

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English, Italian, Arabic, Chinese, Spanish, French, German, Hindi, Japanese, Korean, Russian, Portuguese
Experience level
Senior

Job location

Remote

Tech stack

Artificial Intelligence
Artificial Neural Networks
C++
Computational Linguistics
Nvidia CUDA
Computer Programming
Data Centers
Revision Control Systems
Python
Machine Learning
Signal Processing
Software Engineering
Speech Recognition
Gerrit
PyTorch
Large Language Models
Deep Learning
Gitlab
GIT
Information Technology
Speech Synthesis
TensorRT
Recurrent Neural Networks

Job description

Client is seeking a Deep Learning Scientist, Speech Synthesis to support a leading technology client. This role focuses on advancing cutting edge speech AI solutions with a strong emphasis on text to speech systems and model optimization. The selected candidate will contribute to impactful initiatives that enhance large scale speech applications used by millions of users., * Train speech synthesis mel spectrogram and vocoder models

  • Measure and benchmark model performance across use cases
  • Maintain and enhance text to speech evaluation systems
  • Analyze model accuracy and bias and recommend improvements
  • Improve processes related to speech data preparation, augmentation, and filtering
  • Develop and refine training datasets for speech models
  • Characterize performance and quality metrics across different platforms
  • Collaborate with cross functional teams to deliver new product features
  • Participate in code development, design reviews, and test planning
  • Identify issues, propose solutions, and contribute to continuous innovation

Requirements

  • Master s degree or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, or Computational Linguistics or equivalent experience
  • Minimum of 5 years of relevant experience
  • Strong programming skills in Python
  • Solid understanding of programming fundamentals and software design
  • Deep knowledge of machine learning and deep learning techniques including CNN, RNN, LSTM, and Transformers
  • Experience applying deep learning to speech synthesis, large language models, and speech to speech translation
  • Hands on experience with speech technologies such as speech synthesis and voice cloning
  • Experience training speech models
  • Proficiency with PyTorch deep learning frameworks
  • Knowledge of speech signal processing techniques including FFT, MFCC, and mel spectrograms
  • Familiarity with version control tools such as Git, Gerrit, or GitLab
  • Strong collaboration and communication skills in a matrixed environment, * Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese
  • Experience with multilingual or code switched text to speech systems
  • Experience with voice cloning and cross lingual voice cloning
  • Knowledge of text normalization and inverse text normalization using neural networks or WFST
  • Experience working with grapheme to phoneme systems for multiple languages
  • Interest in linguistics, phonetics, and language technologies
  • Strong C plus plus programming skills
  • Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT
  • Experience deploying machine learning models to cloud, data center, or embedded systems

Apply for this position