Senior Software Engineer - AI Interaction Evaluator (Codex / Claude Code, up to $200/hr)
G2i Inc.
Delray Beach, United States of America
yesterday
Role details
Contract type
Contract Employment type
Part-time (≤ 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 104KJob location
Remote
Delray Beach, United States of America
Tech stack
JavaScript
Artificial Intelligence
Cursor (Graphical User Interface Elements)
Python
TypeScript
Prompt Engineering
Job description
We're looking for a highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code., You will assess how AI coding agents behave in real-world scenarios - focusing on:
- Whether the response makes sense
- Whether the preamble and reasoning are useful
- Whether the output reflects strong engineering judgment
- Whether the interaction feels right to an experienced developer
This role is about engineering taste - not syntax correctness.
What You'll Be Doing
- Evaluate AI-generated coding interactions end-to-end
- Judge whether outputs are:
- Useful
- Correct (at a high level)
- Aligned with how a strong engineer would think
- Assess the quality of explanations and reasoning, not just code
- Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4)
- Provide clear, opinionated feedback on:
- What worked
- What didn't
- What felt "off" or misleading
- Help define what great looks like when interacting with tools like Cursor
What We Mean by "Taste"
We're specifically looking for engineers who can answer questions like:
- Does this feel like something a strong engineer would actually say?
- Is this explanation helpful, or just technically correct?
- Is the model guiding the user well, or just dumping output?
- Would this interaction build or erode trust?
Requirements
- Staff / Principal-level engineer (or equivalent experience)
- Strong background in one of the below:
- TypeScript / JavaScript
- Python
- Hands-on experience using:
- OpenAI Codex
- Claude Code
- Cursor
- Deep familiarity with modern AI-assisted dev workflows
- Able to evaluate code without needing to fully execute or deeply review every line
- Comfortable giving direct, opinionated feedback
- High bar for what "good engineering" looks like
Nice to Have
- Experience with tools like Cursor or similar AI-first IDEs
- Prior exposure to prompt design or evaluation workflows
- Experience mentoring senior engineers or defining engineering standards