Senior Software Engineer, Data
Role details
Job location
Tech stack
Job description
- Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
- Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
- Develop and deploy ML models for entity disambiguation, author linking, and topic classification
- Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
- Contribute to dashboards and tools for evaluating data quality and model precision
- Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment, Note: This job description in no way states or implies that these are the only duties to be performed by the team members(s) of this position. Team members will be required to follow any other job-related instructions and to perform any other job-related duties requested by any person authorized to give instructions or assignments. All duties and responsibilities are essential functions and requirements and are subject to possible modification to reasonably accommodate individuals with disabilities. To perform this job successfully, the team member(s) will possess the skills, aptitudes, and abilities to perform each duty proficiently. Some requirements may exclude individuals who pose a direct threat or significant risk to the health or safety of themselves or others. The requirements listed in this document are the minimum levels of knowledge, skills, or abilities. This document does not create an employment contract, implied or otherwise, other than an at
Requirements
- Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education.
- Strong Python engineering skills, especially for building and maintaining data pipelines
- Experience with SQL and schema design in production settings (PostgreSQL preferred)
- Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large-scale or ambiguous structured datasets
- Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
- Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
- Strong communicator and a strong sense of ownership for results
Preferred:
- Experience with author disambiguation, entity resolution, or record linkage problems
- Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
- Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
- Familiarity with building APIs or data services consumed by automated or agent-based workflows
Physical Demands and Work Environment:
The physical demands described here are representative of those that must be met by a team member to successfully perform the essential functions of this position. Reasonable accommodations may be made to enable individuals with disabilities to perform the functions.
- Must be able to remain in a stationary position for long periods of time.
- The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations.
- The ability to observe details at close range.
- Can work under deadlines.
Benefits & conditions
The Allen Institute for Artificial Intelligence flexible benefit account, paid holidays, sick time, 401(k) United States, Washington, Seattle 2157 North Northlake Way (Show on map) May 02, 2026 Persons in these roles are expected to work from our offices in Seattle. On-site requirements vary based on position and team. If you have questions about on-site work arrangements for this role, please ask your recruiter. Our base salary range is $126,000 - $189,000, and in addition we have generous bonus plans to provide a competitive compensation package., * Team members and their families are covered by medical, dental, vision, and an employee assistance program.
- Team members are able to enroll in our health savings account plan, our healthcare reimbursement arrangement plan, and our health care and dependent care flexible spending account plans.
- Team members are able to enroll in our company's 401k plan.
- Team members will receive $125 per month to assist with commuting or internet expenses and will also receive $200 per month for fitness and wellbeing expenses.
- Team members will also receive up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year and twelve paid holidays throughout the calendar year.
- Team members will be able to receive annual bonuses and can participate in the long-term incentive plan.