AI Support Engineer- Application

Accrete, Inc.

Indianapolis, United States of America

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Job location

Indianapolis, United States of America

Tech stack

API

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Batch Processing

Databases

Data Validation

Database Queries

Software Debugging

Monitoring of Systems

Issue Tracking Systems

Runbook

SQL Databases

Datadog

Grafana

Integration Frameworks

Database Monitoring

Cloudwatch

Kibana

Data Pipelines

Job description

We are looking for an Application Support Engineer (L1/L2) to ensure the stability, reliability, and smooth functioning of our production systems.

This role acts as the first line of defense for system monitoring and incident response, ensuring that issues are identified early, resolved quickly, and escalated appropriately.

The ideal candidate should be comfortable working in a high-availability, fast-paced environment, handling alerts, monitoring data pipelines, and ensuring seamless platform operations., Monitoring & System Health

Monitor production systems using tools such as Datadog, CloudWatch, and internal dashboards
Track system health across APIs, data pipelines, databases, and third-party integrations
Identify anomalies and validate alerts to reduce false positives

Incident Management & Response

Respond to system alerts in real-time (failures, latency spikes, downtime)
Perform initial incident triage and identify impacted components
Execute predefined runbooks and recovery actions (job restarts, retries, etc.) Escalate issues to engineering teams when required

Data Pipeline Monitoring

Monitor scheduled jobs and workflows (e.g., Dagster, SageMaker, batch pipelines)
Identify missing, delayed, or failed data processes
Trigger re-runs or escalate issues to relevant teams

Third-Party & Vendor Monitoring

Monitor failures in external APIs, proxies, and vendor systems
Coordinate with internal teams for resolution
Track and highlight recurring vendor-related issues

Database Monitoring

Perform basic database health checks including:

Connection issues
Slow queries
Replication lag
Storage utilization

Raise alerts for any anomalies

Runbook Execution & Documentation

Follow standard operating procedures and runbooks for known issues
Maintain clear logs of actions taken during incidents
Ensure proper closure and documentation of incidents

Reporting & Shift Handover

Maintain incident logs and reports
Provide structured shift handovers to ensure continuity
Highlight recurring issues and patterns for further analysis

What You Will NOT Be Responsible For

(To set the right expectations clearly)

No deep debugging or code-level fixes
No infrastructure changes
No ownership of alert configurations (handled by SRE/Engineering teams)

Requirements

Do you have experience in SQL?, Must Have

Strong understanding of APIs and HTTP status codes
Experience with monitoring tools/logs (Datadog, CloudWatch, Grafana, Kibana, etc.)
Basic knowledge of SQL (queries, data validation checks)
Ability to work with dashboards, alerts, and incident tracking systems
Experience in incident management / production support environments

Good to Have

Exposure to AWS services (CloudWatch, Lambda basics, etc.)
Understanding of data pipelines and batch processing systems
Familiarity with observability tools and logging systems

Behavioral Competencies

Ability to stay calm under pressure during incidents
Strong communication and coordination skills
High level of ownership and follow-through
Ability to work in a 24x7 support environment with rotational shifts

Benefits & conditions

Strong learning and growth opportunities within platform and SRE functions Competitive compensation and benefits

About the company

Accrete AI is a dynamic and innovative company focused on transforming the future of artificial intelligence. We specialize in creating advanced AI solutions that turn complex data into actionable insights, driving real-world impact for businesses and government organizations. Our team thrives on creativity and collaboration, working together to push the boundaries of AI technology. At the core of our offerings are AI agents-autonomous systems that analyze multimodal data, generate insights, and make intelligent recommendations. These agents help businesses streamline operations, improve decision-making, and empower government entities to enhance security, intelligence, and operational efficiency.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all