Computing Engineer (Monitoring) (IT-DA-ASM-2025-256-LD)
Role details
Job location
Tech stack
Job description
You will join: The Analytics, Streaming and Monitoring section which provides messaging, data streaming, monitoring platforms and services, and frontend analytics services for the different CERN communities to facilitate data collection, transport, manipulation, processing, interactive data analysis and central reporting solutions.
Moreover, the section works with the Experiments, Accelerator sector, IT and other CERN communities interested in implementing data analysis solutions with these technologies.
Functions:
As a Computer Engineer in the Database and Analytics Group, you will:
- Contribute to the development and evolution of the monitoring service.
- Manage the monitoring services and applications, including metrics, logging, alerting and visualisation tools.
- Oversee the monitoring infrastructure, the deployment of software packages, and the operation of the service, ensuring performance and security.
- Provide user support on monitoring, including for WLCG and LHC experiments.
- Offer consultancy, assistance and advice to end users developing data pipelines and monitoring use-cases.
- Contribute to the operations of other services in the section, such as the interactive analytics infrastructure (SWAN, Jupyter Notebooks), and the Messaging and Data Streaming Services.
Requirements
- Master's degree in Computer Science or equivalent experience.
- Proficiency in software development using Java or Python.
- Experience with open-source technologies like Kafka and Spark., Master's degree or equivalent relevant experience in the field of Computer Science or a related field., * Proven experience in software development using (Java or Python).
- Knowledge of established open-source technologies such as OpenTelemetry, Kafka, Spark, OpenSearch and Grafana.
- Analysis of performance to scale the monitoring infrastructure to an ever-growing workload.
- Devops operations (Linux systems) and configuration management experience (e.g. Ansible, Puppet) in Agile development environments and/or cloud native deployments, including tool development, packaging and deployment in Python, Java and Go ecosystems.
- Operation of large-scale production Kubernetes environments (e.g Helm, ArgoCD).
- Technical and troubleshooting skills with Kubernetes components, web interfaces and data analysis engines and packages (e.g. Apache Spark).
Technical competencies:
- Knowledge of system configuration tools: familiarity with automation tools for infrastructure delivery and management.
- Knowledge of programming techniques and languages: java knowledge is required. UNIX shell script, Python and Go programming skills would be an advantage.
- Knowledge of operating systems: such as Linux.
- Capturing and analysis of requirements for ICT systems: ability to collect the needs of users and to manage the different phases of a project.
- Architecture and design of ICT systems: distributed applications and services.
Behavioural competencies:
- Solving Problems: identifying, defining and assessing problems, taking action to address them.
- Achieving Results: delivering high quality work on time and fulfilling expectations.
- Working in Teams: building and maintaining constructive and effective work relationships.
- Communicating Effectively: delivering presentations in a structured and clear way; adjusting style and content to the audience; responding calmly and confidently to questions.
- Learning and Sharing Knowledge: keeping up-to-date with developments in own field of expertise and readily absorbing new information.
Language skills:
Spoken and written English, with a commitment to learn French.
Benefits & conditions
Contract type: Limited duration contract (5 years). Subject to certain conditions, holders of limited-duration contracts may apply for an indefinite position.
Working Hours: 40 hours per week
Job Flexibility: Hybrid
This position involves:
- Work during nights, Sundays and official holidays, when required by the needs of the Organization.
- Stand-by duty, when required by the needs of the Organization.