COSMOS - Data Engineer IV
Role details
Job location
Tech stack
Job description
The Data Engineer IV position is a full-time provisional with the COSMOS Research Center (cosmos.ualr.edu) at the University of Arkansas and is funded through a grant or contract. The position renewal is contingent upon continued grant funding and satisfactory job performance. The Data Engineer IV will collect, manage, and convert raw data into usable information for analytics and decision-making. This role requires comprehensive data analysis skills, developing and maintaining datasets, improving data quality and efficiency through leveraging data systems and pipelines, interpreting trends and patterns of data, complex data reports, and building algorithms and prototypes. In addition, the other responsibilities include leading the development of social media data collection pipelines, inference methods and infrastructure, classification methods and infrastructure, and visualization dashboards. The Data Engineer IV will collaborate with researchers, participate in research projects, and interact with other developers at the COSMOS research center and partner organizations to achieve the best possible performance metrics across various projects. Excellent communication and problem-solving skills are essential for this on-campus position as the data engineer's role is to find opportunities to contribute to cutting-edge research in the exciting field of social computing. This position is governed by state and federal laws, and agency/institution policy.
The position reports to: Dr. Nitin Agarwal (nxagarwal@ualr.edu), Maulden-Entergy Endowed Chair and Distinguished Professor and Director, COSMOS Research Center, UA-Little Rock., * Lead a team of data engineers;
- Collecting and analyzing raw data from various sources including social media platforms;
- Organize and maintain datasets;
- Improving data quality and process efficiency;
- Design and manage data ETL pipelines that encompass the journey of data from source to destination systems processing 10 million+ data points daily, utilizing Kafka for real-time data streaming and MongoDB for NoSQL storage across Kubernetes clusters;
- Design and deploy scalable microservices in Python and Golang, leveraging FlaskAPI, GraphQL, and Docker, ensuring sub-second response times and efficient concurrency with goroutines.
- Migrate large amounts of data from legacy databases to MongoDB to achieve sub-second access latencies and optimize storage for unstructured data through Elasticsearch integration;
- Setup and manage the infrastructure required for ingestion, processing, and storage of data;
- Evaluate the model needs and objectives, interpret trends and patterns of data;
- Conduct complex data analysis and report on results;
- Prepare data for analysis and reporting by transforming and cleansing it;
- Combine raw information from different sources;
- Explore ways to enhance data quality and reliability;
- Identify opportunities for data acquisition;
- Develop analytical tools and programs;
- Collaborate with teams at COSMOS on several projects;
- Managing services and operational infrastructure for system reliability and resiliency;
- Creating continuous integration continuous deployment (CI/CD) pipelines with Jenkins and GitLab CI for automating service/system deployment;
- Integrate Prometheus for monitoring, Grafana for real-time dashboarding/visualization, and log analysis with Kibana sourced from Elasticsearch;
- Front-end development (HTML/CSS, JavaScript, Node.js, etc.);
- Training machine learning (ML) models on datasets;
- Creating continuous integration continuous deployment (CI/CD) pipelines with Jenkins and GitLab CI for automating service/system deployment;
- Integrate Prometheus for monitoring, Grafana for real-time dashboarding/visualization, and log analysis with Kibana sourced from Elasticsearch;
- Front-end development (HTML/CSS, JavaScript, Node.js, etc.);
- Deploying machine learning (ML) models;
- Enhance the system's fault tolerance by incorporating alerting mechanisms;
- Develop frameworks like Spring Boot, React;
- Work on other tasks as asked., The University of Arkansas is an equal opportunity institution. The University does not discriminate in its education programs or activities (including in admission and employment) on the basis of any category or status protected by law, including age, race, color, national origin, disability, religion, protected veteran status, military service, genetic information, sex, sexual preference, or pregnancy. Federal law prohibits the University from discriminating on these bases. Questions or concerns about the application of Title IX, which prohibits discrimination on the basis of sex, may be sent to the University's Title IX Coordinator and to the U.S. Department of Education Office for Civil Rights.
Persons must have proof of legal authority to work in the United States on the first day of employment.
All application information is subject to public disclosure under the Arkansas Freedom of Information Act.
Constant Physical Activity:
Hearing, Manipulate items with fingers, including keyboarding, Repetitive Motion, Sitting, Standing, Talking, Walking
Frequent Physical Activity:
Hearing, Manipulate items with fingers, including keyboarding, Repetitive Motion, Sitting, Standing, Talking, Walking
Occasional Physical Activity:
Hearing, Manipulate items with fingers, including keyboarding, Repetitive Motion, Sitting, Standing, Talking, Walking
Requirements
- The candidate must have a master's degree in Computer Science, Information Science, or a related discipline;
- The candidate must have 4+ years of experience as a data engineer/software developer/software engineer/database administrator, or other similar roles; or a PhD degree in Computer Science, Information Science, or a related discipline;
- The candidate must have 2+ years of experience leading a team of data engineers., * Expert proficiency level in working with data models, data pipelines, ETL processes, data stores, data mining, and segmentation techniques;
- Expert proficiency level in working with programming/scripting languages (e.g., Java and Python);
- Expert proficiency level in working with data integration platforms and SQL database design;
- Expert proficiency level in working with numerical, analytical, and data security skills;
- Expert proficiency level in collecting raw data from various social media platforms;
- Expert proficiency level in creating CI/CD pipelines;
- Expert proficiency level with front-end development (HTML/CSS, JavaScript, Node.js, etc.);
- Expert proficiency level with training and deploying machine learning (ML) models on datasets;
- Ability to lead a large team of data engineers (5+ members);
- Expert proficiency level with Kafka and MongoDB for NoSQL storage across Kubernetes clusters;
- Expert proficiency level with microservices, Python, Golang, FlaskAPI, GraphQL, and Docker;
- Expert proficiency level with Elasticsearch, Grafana, Prometheus, and Kibana.
- Expert proficiency level in data modeling concepts (ERD, Dimensional Modeling, Data Vault) and data APIs (RESTful API);
- Expert proficiency level in data processing software (e.g., Hadoop, Spark, TensorFlow, Pig, Hive) and algorithms (e.g., MapReduce, Flume);
- Expert proficiency level in cloud platforms (AWS, Azure, GCP) and data warehousing solutions (Snowflake, Amazon Red Shift, Google BigQuery, Azure Synapse);
- Expert proficiency level in technical communications.