Senior Software Engineer Onsite (San Francisco, CA)
Role details
Job location
Tech stack
Job description
One of our customers was processing 50,000 faxes a month. Each one required a person to spend five minutes reading, sorting, and inputting the relevant information-armies of 10 to 20 people, eight hours a day, five days a week, just to keep up. The day after they brought this problem to us, we had them forwarding faxes into our system. Within two weeks we had an MVP running on actual production faxes: a tool that reads incoming referrals and prior authorizations and calls the patient to schedule them. What used to take 5 to 7 days now takes one minute. That product is live today, and we didn't cut corners on compliance or quality to get there.
Building on and expanding that platform is the work. Specific problems you'd wrestle with:
Voice and conversation design at scale. Our agents speak to tens of thousands of patients daily. The gap between "technically correct" and "actually human" is enormous, and it's not a problem you can prompt your way out of. One of our engineers spent three or four hours generating iterations of filler phrases ("mmm, let me look that up for you") because cadence and intonation matter to a patient calling about their health. An LLM can generate the options. Someone with ears has to choose.
Knowing when the agent should stop. Healthcare conversations move through predictable stages-intake, verification, scheduling, follow-up-and what's appropriate to say at each stage isn't always obvious from the outside. A patient's diagnosis might be relevant at one point in the conversation and completely out of bounds thirty seconds earlier. A model optimizing for correctness will flatten all of that. Building systems that know not just what to say but when-and that route edge cases to a human without grinding everything to a halt-is some of the most consequential work we're doing. Get it wrong and it's not just a bad experience. Depending on what gets said to whom, it can be a compliance violation.
Building the infrastructure that makes all of this possible. Getting an agent to fix a bug autonomously sounds simple. Getting it to do that reliably, on a production codebase that handles real patient data, is a different problem. It requires evaluation systems that can tell the difference between a good fix and a confident wrong answer, context that's rich enough for the agent to understand what it's touching, and guardrails that fail safely when it doesn't. We've built a lot of this. There's more to build. And the honest truth is that the agents are only as trustworthy as the humans who designed the systems around them.
A Day in Your Life
No two days look the same, but here's a realistic sketch:
Morning: You're building out a new capability on the agent platform: a way to handle prescription refill requests that a practice flagged as fully manual and error-prone. You're using the internal agent harness to scaffold the initial implementation, then reviewing what it produced and making the judgment calls it couldn't. If you need to test something under real-world conditions-noise reduction, how the agent handles a difficult accent-you might head to the cafe downstairs. (We get out of the lab.)
Late morning: An alert surfaces from the Datadog agent: something's off in how insurance information is being collected in a specific scenario. The symptom points to one place; the root cause turns out to be somewhere completely different. The agent had context across the entire codebase, meeting transcripts, and Notion pages and traced it faster than any engineer who knew the stack. You verify the fix and ship it.
After lunch: One of the QA folks who audits patient calls posts in Slack that she needs a "Mark as Reviewed" button. It takes a few seconds per call, but she reviews thousands of them. She tags the Slack bot hooked to our codebase. Five minutes later the button exists. You give it a once-over and it's live.
End of day: You sync with CSMs and Eric on what they heard from customers. One conversation surfaces a new product opportunity. You sketch what a prototype might look like. (You'll be building it tomorrow.), * You're drawn to heavy ML research. That's not what this is, and it would probably frustrate you. We're building products that leverage models, not the models themselves, and a generalist engineer with strong instincts and genuine curiosity will outperform an ML specialist here almost every time.
- You want someone to define the problem in detail before you start. We're moving too fast for that, and the most interesting problems here are the ones nobody has fully defined yet.
- You've never taken real ownership of a project. If you're used to working with an EM or PM who assigns you detailed Jira tickets on exactly how to build it, this will be disorienting.
- You need a detailed roadmap three years out. We shoot for a north star but we're constantly adapting. Agility is our advantage., In-person, Downtown SF by Montgomery BART. After years of remote work, we wanted to be in the same room again to brainstorm, whiteboard, and actually enjoy each other's company. There's high ownership, no BS meetings, and a ton of gourmet snacks. Once a week we also get out of the office for a team lunch… because some of the best conversations happen away from the whiteboards.
Requirements
You question things. Not performatively, but you genuinely don't accept the status quo when you can see a better way. You're the counterbalance to people who defer too quickly, and you make the team better for it.
You care about shipping things that reach real people, not abstract users. That context should inform how you think about every decision you make here. If it doesn't, this probably won't be the right fit.
You're honest about what you don't know. "You might have a point; I may be wrong" should be something you've said recently and meant. We debate, we disagree, we challenge each other, but we do it with respect and with the assumption of good intent. Someone who can't say they're wrong won't last long here, and frankly won't enjoy it much either., 3-7 years of relevant engineering experience. Expertise in Python, Django, and React. Strong backend fundamentals: you've designed and implemented scalable, robust systems. Previous experience at an early-stage startup is a plus. Genuine belief that agentic tools are the future of how software gets built.
Benefits & conditions
Pulled from the full job description
- 401(k)
- Health insurance
- Vision insurance
- Dental insurance
- Gym membership
- Unlimited paid time off
- Commuter assistance, * Excellent health, dental, and vision insurance
- Free dinner
- Free illy espresso, coffee, and beer
- Fitness stipend
- Commuter benefits
- Unlimited PTO
- 401(k)
- Relocation stipend