Data Scientist
SGF GLOBAL
Houston, United States of America
6 days ago
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Intermediate Compensation
$ 210KJob location
Houston, United States of America
Tech stack
Artificial Intelligence
Artificial Neural Networks
Big Data
C++
Nvidia CUDA
Python
Machine Learning
NumPy
TensorFlow
SciPy
Sensor Fusion
Signal Processing
Software Engineering
PyTorch
Transfer Learning
Deep Learning
Keras
Pandas
Information Technology
Stable Diffusion
Job description
We are seeking a highly skilled Data Scientist to build, train, and deploy large-scale self-supervised "foundation" models for time-series, sensor, multimodal, and industrial scientific data. This role focuses on developing advanced deep learning architectures capable of learning rich representations from high-dimensional sequential signals, later fine-tuned for tasks such as:
- Anomaly/event detection
- Predictive maintenance
- Forecasting
- Classification
- Multi-sensor fusion
- Industrial/scientific modeling
This is a high-impact, research-driven role working with large datasets, complex sensor modalities, and distributed training infrastructure., 1. Foundation Model Development
- Build and train self-supervised and semi-supervised foundation models for time-series and multimodal data
- Fine-tune large models for domain-specific tasks
- Apply contrastive learning, masked modeling, temporal predictive coding, multimodal alignment, etc.
- Develop transfer learning, adapter, and prompt-based strategies for rapid downstream adaptation
- Data & Signal Processing
- Process, augment, and engineer features for univariate/multivariate time-series datasets
- Analyze IoT sensor streams, industrial vibration/temperature data, audio, imagery, etc.
- Perform sampling, synchronization, denoising, artifact removal, and sensor quality checks
- Integrate time series with images, structured data, audio, and text
- Advanced Machine Learning & Architectures
- Build models using:
- RNNs / GRU / LSTMs
- TCNs
- 1D/2D/3D CNNs
- Transformers (BERT, ViT, TimeSFormer)
- Graph Neural Networks
- Diffusion / generative architectures
- Multi-modal encoders and fusion models
- Evaluate model performance using:
- MSE, RMSE, R²
- F1, AUC, Precision/Recall
- DTW, correlation, similarity metrics
- IoU and event-based segmentation metrics
- Software Engineering & Infrastructure
- Build production-ready pipelines for ingesting, cleaning, segmenting, and aligning large-scale multi-sensor datasets
- Develop in:
- Python (NumPy, Pandas, SciPy)
- PyTorch (Lightning, Distributed)
- TensorFlow/Keras
- JAX/Flax
- C++/CUDA for custom kernels
- Train models on:
- Multi-GPU and multi-node clusters
- Mixed-precision systems
- Distributed optimization (ZeRO, DDP, etc.)
- Mathematical & Algorithmic Foundations
- Apply strong background in:
- Linear algebra, probability, and statistics
- Signal processing (Fourier, wavelets, Kalman filters, noise modeling)
- Optimization (stochastic, convex, non-convex)
- Numerical methods, ODE/PDE modeling, regularization techniques
- Collaboration & Communication
- Partner with scientists, engineers, domain experts, and product teams
- Present model behavior insights, attention maps, and uncertainty quantification
- Communicate findings clearly to both technical and non-technical audiences
Requirements
- MS or PhD in Computer Science, Data Science, AI, Engineering, or related fields
- 3+ years of experience in Data Science, Machine Learning, or AI
- Strong experience building and training deep learning models
- Experience working with time-series or sensor data
- Proficiency in Python, deep learning frameworks, and ML engineering best practices, * Experience with multimodal learning
- Experience with large-scale distributed training
- Background in industrial, scientific, or sensor-driven AI