Data Scientist
Role details
Job location
Tech stack
Job description
-
Apply hands-on experience in Python, NLP frameworks, SQL, Pandas, NLTK, and spaCy to solve real-world data challenges
-
Analyze trends and transactional data using strong SQL skills
-
Develop, test, and deploy new techniques for NLP understanding
-
Build scalable ML and Generative AI solutions, including Large Language Models (LLMs)
-
Train and optimize NLP/LLM models and build Python-based data pipelines
-
Build cloud-native solutions on AWS
-
Determine the nature of analytic problems, evaluate options, and recommend resolutions
-
Advise on methods and data needed to evaluate complex data problems
-
Collaborate with data collectors and analysts to close gaps on complex monitoring problems
-
Deliver accurate, timely, and sophisticated data analysis
Requirements
We are seeking a Senior Data Scientist with deep, hands-on expertise in Natural Language Processing (NLP) and Generative AI/LLMs to support a federal data science initiative. The ideal candidate is a true self-starter who can operate independently, translate complex analytic problems into automated data solutions, and communicate findings clearly to both technical teams and executive leadership., * Bachelor''s degree in Statistics, Applied Mathematics, Computer Science, or Information Science, with industry experience in Python, NLP frameworks, SQL, Pandas, NLTK, spaCy, data science, and AI/ML/LLM engineering
-
10+ years overall IT industry experience
-
Education/experience combinations accepted: Master''s + 10 years; Bachelor''s + 12 years; or 18 years in lieu of a degree
Required Skills
-
Solid experience with NLP, Python, NLP frameworks, SQL, Pandas, NLTK, and spaCy
-
Experience with Generative AI and LLMs
-
Demonstrated self-starter, able to operate independently
-
Fluency in Python, version control/Git, standard Python packages (Pandas, NumPy, Matplotlib), and ML frameworks
-
Knowledge of TensorFlow, PyTorch, Pandas, scikit-learn, NLTK, AWS EC2 (Azure ML a plus)
-
Experience with scalable data engineering frameworks (e.g., Apache Spark) and orchestration frameworks (e.g., Airflow), and/or semantic search
-
Expert-level data analysis and advanced statistical/ML methods to build, train, test, and evaluate supervised and unsupervised models
-
Experience with ML model deployment and operations (DevOps, MLOps, LLMOps)
-
Experience with NLP/Generative AI libraries (e.g., spaCy, LangChain), text annotation tools, and semantic frameworks
-
Ability to clean and process large volumes of real-world data
-
Experience retrieving/manipulating data from varied sources (DB2, Oracle, SQL Server, Hadoop, flat files)
-
Experience with database management systems (PostgreSQL, MySQL, SQLite, SQL, etc.)
-
Excellent analytical and problem-solving skills; ability to identify risks and propose solutions
-
Excellent written and verbal communication skills across audiences, including executive leadership
Desired Skills
-
Prior experience on federal or state government IT projects
-
Industry experience strongly preferred
-
Experience with, or willingness to learn, the Hadoop ecosystem (Spark, Impala, Hive)
-
Experience in an analytical research environment
-
Experience in parallel/GPU processing (CUDA)
-
Experience with Mathematica
-
Experience with markup languages (LaTeX, HTML)
-
Experience with NLP for anomaly detection