Deep Learning Scientist, Speech Synthesis

Job Cloud Inc.

yesterday

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, Italian, Arabic, Chinese, Spanish, French, German, Hindi, Japanese, Korean, Russian, Portuguese

Experience level

Senior

Job location

Remote

Tech stack

Artificial Intelligence

Artificial Neural Networks

C++

Computational Linguistics

Nvidia CUDA

Computer Programming

Data Centers

Revision Control Systems

Python

Machine Learning

Signal Processing

Software Engineering

Speech Recognition

Gerrit

PyTorch

Large Language Models

Deep Learning

Gitlab

GIT

Information Technology

Speech Synthesis

TensorRT

Recurrent Neural Networks

Job description

Client is seeking a Deep Learning Scientist, Speech Synthesis to support a leading technology client. This role focuses on advancing cutting edge speech AI solutions with a strong emphasis on text to speech systems and model optimization. The selected candidate will contribute to impactful initiatives that enhance large scale speech applications used by millions of users., * Train speech synthesis mel spectrogram and vocoder models

Measure and benchmark model performance across use cases
Maintain and enhance text to speech evaluation systems
Analyze model accuracy and bias and recommend improvements
Improve processes related to speech data preparation, augmentation, and filtering
Develop and refine training datasets for speech models
Characterize performance and quality metrics across different platforms
Collaborate with cross functional teams to deliver new product features
Participate in code development, design reviews, and test planning
Identify issues, propose solutions, and contribute to continuous innovation

Requirements

Master s degree or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, or Computational Linguistics or equivalent experience
Minimum of 5 years of relevant experience
Strong programming skills in Python
Solid understanding of programming fundamentals and software design
Deep knowledge of machine learning and deep learning techniques including CNN, RNN, LSTM, and Transformers
Experience applying deep learning to speech synthesis, large language models, and speech to speech translation
Hands on experience with speech technologies such as speech synthesis and voice cloning
Experience training speech models
Proficiency with PyTorch deep learning frameworks
Knowledge of speech signal processing techniques including FFT, MFCC, and mel spectrograms
Familiarity with version control tools such as Git, Gerrit, or GitLab
Strong collaboration and communication skills in a matrixed environment, * Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese
Experience with multilingual or code switched text to speech systems
Experience with voice cloning and cross lingual voice cloning
Knowledge of text normalization and inverse text normalization using neural networks or WFST
Experience working with grapheme to phoneme systems for multiple languages
Interest in linguistics, phonetics, and language technologies
Strong C plus plus programming skills
Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT
Experience deploying machine learning models to cloud, data center, or embedded systems

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all