Application Developer
Role details
Job location
Tech stack
Job description
NExScI at Caltech/IPAC has an opening for an AI Application Developer to lead the development, production deployment, and scaling of TAPchat - an AI-powered conversational interface that helps astronomers query and explore data from NASA's astronomical archives. Beyond TAPchat, this role will identify and develop new applications for AI and large language models across IPAC's data archives and scientific workflows. Come be a part of the team that is helping astronomers and data scientists all over the world access and explore astronomy data!
TAPchat uses large language models to translate natural-language questions into database queries, data analysis, and visualizations. The application is a working prototype built with Python/FastAPI and JavaScript that supports multiple LLM providers (Anthropic Claude, OpenAI, Google Gemini, and local models via Ollama), authenticates users via OAuth2, and runs in Docker with PostgreSQL. You will take ownership of this codebase, harden it for production use, and scale it to serve a growing community of researchers.
This role will work within a vibrant team of scientists and developers at the NASA Exoplanet Science Institute (NExScI). As a part of IPAC, NExScI (nexsci.caltech.edu) provides archive services, community support, science operations, and analysis tools related to the discovery and characterization of planets beyond our solar system (exoplanets) using data from observatories in space and on the ground. IPAC also hosts several other NASA and Caltech data archives, and this position will have the opportunity to bring AI-driven tools and approaches to those projects as well.
Essential Job Duties As an Applications Developer, your primary focus will be leading TAPchat to production, while also exploring AI/LLM applications across IPAC. Key responsibilities:
- Take ownership of an existing Python/FastAPI + JavaScript codebase and bring it from prototype to production-quality service.
- Design and improve the AI agent loop - prompt engineering, tool design, retrieval-augmented generation (RAG) over archive documentation, multi-model support, and evaluation of LLM responses for scientific accuracy.
- Design and implement secure code execution sandboxing to safely run user-initiated data analysis (e.g., subprocess isolation, containers, or task queues).
- Improve test coverage, CI/CD pipelines, and deployment automation.
- Optimize database performance (indexing, connection pooling, query optimization) and implement monitoring, logging, and alerting.
- Scale the application to support a growing user base, including evaluating cloud deployment options and horizontal scaling strategies.
- Implement rate limiting, security hardening, and operational tooling for a public-facing service.
- Collaborate with scientists across IPAC to add new archive integrations, improve the AI agent's tool suite, and enhance data visualization capabilities (e.g., migrating from static plots to interactive visualizations with Bokeh or Plotly).
- Write and maintain deployment documentation, runbooks, and architecture decision records.
- Identify opportunities to apply AI and LLM technologies to other IPAC data archives and scientific workflows - prototyping new tools, evaluating feasibility, and championing adoption.
- Stay current with the rapidly evolving AI/LLM landscape and advise the team on which new capabilities (e.g., multimodal models, retrieval-augmented generation, fine-tuning) are worth adopting.
Requirements
- Bachelor's or equivalent degree in Computer Science, Data Science, Astronomy/Astrophysics, or related field.
- A minimum of 3 years of relevant professional experience.
- Proficiency in Python, including modern web frameworks (FastAPI, Flask, or similar).
- Experience integrating with LLM APIs (Anthropic Claude, OpenAI, Google Gemini, or similar) and an understanding of prompt engineering, retrieval-augmented generation (RAG), and agentic AI patterns.
- Experience developing and deploying web applications, with willingness to work across the stack (backend, database, frontend, infrastructure).
- Strong communication and interpersonal skills.
Preferred Qualifications These additional qualifications may give you a strong start, though we still encourage you to apply even if you don't have all of them:
- Experience with relational databases (PostgreSQL or similar), including schema design and SQL.
- Experience with Docker, containerized deployments, and CI/CD pipelines.
- Familiarity with REST API design and real-time data streaming (SSE or WebSockets).
- Comfort working with JavaScript for frontend development.
- Master's or PhD in Computer Science, Data Science, Astronomy/Astrophysics, or related field.
- Experience building agentic AI applications with tool use, structured outputs, and evaluation frameworks.
- Familiarity with the Python scientific computing ecosystem (pandas, NumPy, SciPy, Astropy, Matplotlib).
- Experience with process sandboxing, security isolation, or task queue systems (Celery, ARQ, Dramatiq).
- Experience with cloud platforms (AWS, GCP) and infrastructure-as-code tools.
- Knowledge of OAuth2/OIDC authentication flows and web application security.
- Experience with Kubernetes or similar container orchestration at scale.
- Background in astronomy, astrophysics, or scientific data systems.
- Experience with interactive data visualization libraries (Bokeh, Plotly, D3.js).