Senior Data Scientist

LEELA SERINA INC.
Baltimore, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 166K

Job location

Baltimore, United States of America

Tech stack

HTML
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Cloudera Impala
Nvidia CUDA
Databases
Data Systems
IBM DB2
DevOps
Distributed Computing Environment
Markup Languages
Hadoop
Hive
Information Sciences
Python
Latex
PostgreSQL
Machine Learning
Mathematica
Microsoft SQL Server
Language Modeling
MySQL
Natural Language Processing
NLTK
NumPy
Oracle Applications
TensorFlow
Azure
SQLite
SQL Databases
Transaction Data
Workflow Management Systems
Data Processing
PyTorch
Large Language Models
Spark
Parallel Computation
Generative AI
Gpu Programming
GIT
Pandas
Matplotlib
Scikit Learn
Information Technology
Integration Frameworks
Machine Learning Operations
Spacy
Software Version Control

Job description

Seeking a Senior Data Scientist (NLP) to join our team in Woodlawn, MD supporting a large federal agency. This role requires deep expertise in Natural Language Processing (NLP) and Generative AI. In this role, you will bridge the gap between complex algorithmic research and scalable production systems. You will not only build sophisticated language models but also act as a technical leader, translating intricate data insights into strategic business decisions and collaborating closely with cross-functional teams.

  • This is a permanent role expected to be onsite 5 days a week.

Primary Responsibilities:

  • Apply expertise in Python, NLP frameworks, SQL, Pandas, NLTK, SPACy and LLMs.

  • Query and analyze complex transactional data using SQL.

  • Understand real world challenges and develop automated data solutions.

  • Develop, test, and deploy new techniques for NLP understanding.

  • Scalable development/deployment of ML and Generative AI approaches (such as Large Language Models).

  • Determine the nature of analytic problems, evaluate options, and offer recommendations for resolution.

  • Advise on the methods and data needed and/or available to evaluate the (intelligence or data) problem.

  • Collaborate with data collectors and analysts to identify and close gaps in complex monitoring problems.

  • Provide accurate, timely, complex, and sophisticated data analysis.

  • Train and optimize NLP/LLM models and create Python based pipelines.

  • Build cloud native solutions on AWS.

Requirements

  • Ability to obtain and maintain an SSA Public Trust clearance is required.

  • Master's with 10+ years, Bachelor's 12+ years, or 18+ years of relevant experience.

  • Bachelor's degree in Statistics, Applied Mathematics, Computer Science, or Information Science and industry experience in Python, SQL, NLP (spaCy/NLTK), and LLM engineering.

  • Experience with Generative AI and Large Language Models (LLMs)

  • Experience with ML model deployment and operations like DevOps, MLOps, LLMOps.

  • Expertise with Natural Language Processing (NLP), Python, NLP frameworks, SQL, Pandas, NLTK and SPACy.

  • Fluent in Python Programming, version control and collaboration with GIT, standard Python packages (ex. Pandas, numpy, matplotlib) and ML frameworks

  • Knowledge of TensorFlow, PyTorch, Pandas, scikit-learn, NLTK, Azure ML (optional), and AWS EC2.

  • Experience with scalable data frameworks (Apache Spark) and workflow orchestration tools (Apache Airflow).

  • Expert knowledge in conducting data analysis and applying advanced statistical concepts and ML methods to build, train, test, and evaluate a variety of supervised and unsupervised analytic models.

  • Proficient in extracting and manipulating data from diverse sources, including SQL databases (DB2, Oracle, SQL Server), Hadoop, and flat files.

  • Experience with database management systems (e.g., PostgresSQL, MySQL, SQLite, SQL, etc.).

  • Experience with NLP and Generative AI libraries (e.g., spaCy, LangChain), text annotation, and semantic frameworks.

  • Excellent problem-solving skills, ability to collaborate with cross-functional teams and proven communication in written and verbal formats to various audiences to include executive leadership.

  • Excellent analytical skills to identify potential risks and propose effective solutions.

  • Ability to clean and process large amounts of real-world data.

Desired Qualifications:

  • Prior experience delivering IT projects within federal or state government sectors is highly preferred.

  • Experience with or a willingness to learn distributed processing via the Hadoop ecosystem (Spark, Impala, Hive).

  • Experience in parallel processing such as GPU programming with CUDA.

  • Experience with Natural Language Processing for anomaly detection.

  • Experience using markup languages such as LaTeX, HTML, etc.

  • Experience with Mathematica.

Apply for this position