Senior Software Engineer - AI Interaction Evaluator (Codex / Claude Code, up to $200/hr)

G2i Inc.
Delray Beach, United States of America
yesterday

Role details

Contract type
Contract
Employment type
Part-time (≤ 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 104K

Job location

Remote
Delray Beach, United States of America

Tech stack

JavaScript
Artificial Intelligence
Cursor (Graphical User Interface Elements)
Python
TypeScript
Prompt Engineering

Job description

We're looking for a highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code., You will assess how AI coding agents behave in real-world scenarios - focusing on:

  • Whether the response makes sense
  • Whether the preamble and reasoning are useful
  • Whether the output reflects strong engineering judgment
  • Whether the interaction feels right to an experienced developer

This role is about engineering taste - not syntax correctness.

What You'll Be Doing

  • Evaluate AI-generated coding interactions end-to-end
  • Judge whether outputs are:
  • Useful
  • Correct (at a high level)
  • Aligned with how a strong engineer would think
  • Assess the quality of explanations and reasoning, not just code
  • Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4)
  • Provide clear, opinionated feedback on:
  • What worked
  • What didn't
  • What felt "off" or misleading
  • Help define what great looks like when interacting with tools like Cursor

What We Mean by "Taste"

We're specifically looking for engineers who can answer questions like:

  • Does this feel like something a strong engineer would actually say?
  • Is this explanation helpful, or just technically correct?
  • Is the model guiding the user well, or just dumping output?
  • Would this interaction build or erode trust?

Requirements

  • Staff / Principal-level engineer (or equivalent experience)
  • Strong background in one of the below:
  • TypeScript / JavaScript
  • Python
  • Hands-on experience using:
  • OpenAI Codex
  • Claude Code
  • Cursor
  • Deep familiarity with modern AI-assisted dev workflows
  • Able to evaluate code without needing to fully execute or deeply review every line
  • Comfortable giving direct, opinionated feedback
  • High bar for what "good engineering" looks like

Nice to Have

  • Experience with tools like Cursor or similar AI-first IDEs
  • Prior exposure to prompt design or evaluation workflows
  • Experience mentoring senior engineers or defining engineering standards

Apply for this position