Data Analyst
Here Technologies
Alpharetta, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Remote
Alpharetta, United States of America
Tech stack
HTML
Data analysis
Azure
Google BigQuery
Cloud Database
Code Generation
Computer Programming
Databases
Data as a Services
Data Dictionary
Information Engineering
Data Mining
Data Structures
Data Visualization
Data Warehousing
Database Queries
Document-Oriented Databases
Xbrl
Python
PostgreSQL
Microsoft SQL Server
SQL Azure
Neo4j
Object-Oriented Software Development
Oracle Applications
PowerDesigner
Reference Data
Regular Expressions
Power BI
Shell Script
SQL Databases
Tableau
Encapsulation (Networking)
Parquet
Data Processing
Scripting (Bash/Python/Go/Ruby)
Data Storage Technologies
Data Classification
Snowflake
Pandas
Data Lake
Information Technology
Avro
Data Management
Api Design
Data Pipelines
Redshift
Job description
- PDF & Document Data Extraction: Lead the extraction and post-processing of structured financial data from PDF, HTML, and scanned documents, applying OCR tooling and validation logic to ensure accuracy and consistency at scale.
- Data Extraction and Standardization: Lead initiatives to extract and standardise financial data from various formats, including XBRL and iXBRL, ensuring data accuracy and consistency.
- Mentorship and Development: Provide guidance and support to junior analysts, fostering their growth in data analysis, programming, and statistical methodologies.
- Exploratory Data Analysis: Conduct exploratory data analysis to identify trends, raise important questions, and derive actionable insights.
- Data Model Development: Design, implement and optimise conceptual, logical and physical data models for enterprise-scale data products. Develop and maintain data models using ERD diagrams and manage the data dictionaries for transactional, star and flat schemas.
- Data Model Democratization: Partner with data engineering teams to democratise the data model for designing efficient data pipelines.
- Data Modelling Standards: Define and enforce data modelling standards and best practices. Conduct data analysis to validate modelling standard compliance, model accuracy, identify anomalies, and ensure data quality.
- Data Classification and Taxonomies: Design custom taxonomies and reference data classification methods/structures.
- Programming and Automation: Utilise Python (primary), SQL, Regex, and Shell Scripts for data manipulation, analysis, and automation of processes, including meta-programming and dynamic code generation.
- Database Management: Manage and optimise databases (SQL Server, Neo4j, Snowflake, Postgres), understanding join types, aggregate functions, and data storage formats (Parquet, AVRO, Delta).
- Collaboration: Collaborate with product managers, data engineers, and analysts to translate business requirements into robust data structures. Work closely with cross-functional teams to address data quality issues and implement effective solutions.
Requirements
- XBRL / iXBRL: Familiarity with XBRL and iXBRL financial reporting standards and tooling for parsing and validating tagged financial data is preferred.
- Advanced Data Modelling Techniques: Experience with advanced modelling approaches.
- Business Analysis Expertise: Ability to bridge the gap between technical and business requirements.
- Project Management: Skills in managing projects, timelines, and deliverables.
- Data Visualization Tools: Proficiency with tools such as Tableau, Power BI, or similar.
- OCR / document extraction tooling: Experience with PDF text extraction libraries (pdfplumber, PyMuPDF, Tesseract OCR) is a strong advantage., * Bachelor's degree in Computer Science, Information Technology, or equivalent.
- 5+ years of experience in data management., * Highly skilled in data modelling: Experience developing data models from scratch for green field projects in multiple domains. Deep understanding of data warehousing concepts, dimensional modelling, and normalisation/denormalisation techniques. Expertise in tools such as Erwin Data Modeler, PowerDesigner, or similar.
- Python programming (required): Proficient in Python for data manipulation, automation, and scripting. Comfortable with core OOP concepts (classes, inheritance, encapsulation) at a working level. Experience with pandas, regex, and API-based data extraction.
- Knowledge of data products: Strong understanding of data product design principles and lifecycle.
- Strong SQL skills and experience with relational (e.g., Oracle, SQL Server, PostgreSQL) and cloud databases (e.g., Snowflake, BigQuery, Redshift).
- Good understanding of Azure cloud data services (Data Lake, Azure SQL).
- Problem-Solving: Adept at tackling complex issues and finding effective solutions.
- Curiosity and Self-Starter: Always eager to learn and take initiative without needing constant guidance.
- Comfortable with Ambiguity: Capable of working efficiently even when the answers are not immediately clear.
- Effective Communication: Excellent at conveying complex ideas and collaborating with stakeholders.
- Experience with Databases and Data Formats: Familiar with various databases, operating systems, file types, and data formats.