Technical and Operational

Welcome! If you are involved in technical or operational aspects of establishing a Federated EGA node, you are in the right place. The information here covers topics related - but not limited - to technical infrastructure, testing, software/hardware, SOPs, Helpdesk, and team capacity building.

You might find this page useful if you are:

  • a technical team leader
  • responsible for procuring resources
  • a software developer or engineer
  • a bioinformatician
  • a support officer

By exploring these materials, you will be able to:

  • Understand guidelines and standards for establishing and operating a node
  • Set up your node using the local EGA software stack
  • Interact with Central EGA using RabbitMQ
  • Evaluate your node implementation using the FEGA Maturity Model
  • Plan your node end-to-end demonstrator

1. Identify node requirements

Standards

  • The FEGA Node Operations Guidelines document gives an overview of the operational areas which require resources in order to establish and operate a Federated EGA node. The document is based on more than 10 years experience of establishing and operating the EMBL-EBI and CRG Central EGA nodes. It provides a breakdown of the operational areas of responsibility into Helpdesk Services, Technical Operations, Software Development, and IT Infrastructure.
  • Federated EGA was established on the principle of implementing global, community standards, including those developed as part of GA4GH and ELIXIR.
  • Overview of local EGA services and architecture (19 June 2020)

Software

A minimal Federated EGA node can be set up on your local infrastructure using the localEGA software (GitHub repository) and the associated Readthedocs webpages. It is not required to use the local EGA software suite, but it is a great option compared to developing your own Federated EGA node software from scratch.

More information about the local EGA software and its implementation can be found in this report on implementation and documentation to create an operational EGA node (2 June 2021).

Standard Operating Procedures (SOPs)

It is useful to establish SOPs for common node operational tasks to enable consistent service delivery and streamline internal processes. Use this template to develop SOPs for your own node.

Standard interactions between Central EGA and Federated EGA node Helpdesk staff have been developed into a set of SOPs. Follow these SOPs below.

Shared FEGA ↔ CEGA SOPs

The following SOPs must be followed as part of current FEGA ↔ CEGA node interactions during the submission process:

Central EGA Helpdesk have developed a set of SOPs to harmonise both user-facing processes and internal processes. Explore some examples of these SOPs below or organised in this shared EGA Helpdesk SOP Google Drive. Please note these are example SOPs and will need to be adapted to how your node operates!

Example User-facing Process SOPs

Example Internal Node Process SOPs

2. Learn from current node implementations

Check-out current Federated EGA node implementations from some of the first established nodes:

Technical requirements

# Component Description Required Link to
1 URL Official website for documentation -
2 Credentials For connecting to Central EGA there are 2 types of credentials required:
User API for identifying and validating in Submitter Inbox the cEGA accounts
cEGA MQ credentials for connecting FEGA Node MQ
Y
3 FEGA Node Encryption key pair A Crypt4GH key pair generated by the FEGA Node, and the public is shared with the Submitter so that files are sent encrypted for the FEGA Node Y
4 FEGA Node MQ RabbitMQ which is connected in shovel and federation mechanism with the cEGA MQ Y 2
5 Submitter Inbox An Inbox solution for the Researcher to submit files to a specific Node (e.g. SFTP, REST API).
The Inbox needs to be accessible by a Researcher via an URL.
Y 2, 3
6 Ingest Pipeline Means of interfacing with the cEGA so that the required messages are being sent at relevant steps of the submission process. https://localega.readthedocs.io/en/latest/amqp.html Y 4, 7, 8, 9
7 Archive storage Storage solution for storing archived files. e.g. S3, POSIX etc. Y 6
8 Archive File Database Means of storing information about the archived files and their AccessionIDs and dataset IDs mapping to AccessionIDs (file to dataset mapping).
Note: other details can be stored in the database e.g. checksums, timestamps, headers of crypt4gh file etc.
Y 6
9 Backup Storage Storage solution for storing a backup of the archived files. e.g. S3, POSIX etc.
Note: A file needs to be backed up in a different location than the Archive
Y 6
10 Main Source Code Repository Where the source code can be found N
11 Main Programming Language Main programming language for the technical stack pipeline -
12 Deployment Method Means which technology is used to put in production different components -
13 Helpdesk Management System Main tool for user/helpdesk communication N
14 Helpdesk Portal Technology Federated EGA node helpedesk will provide means of interacting with submitter, registering DPA, establisng DACs and monitoring submissions. N 15
15 Monitoring Tool Means of providing an overview of the status of submissions.
Good also for auditing purposes.
N 14
16 Metadata Submission Method Means how the researchers will send the metadata to Central EGA through a cEGA submitter portal, node portal or another solution Y
17 Download Solution Means for a Requester to access the data once approval has been granted. Y 7, 8, 18
18 Data Access Tool Means of enabling Requesters to Download the archived data.
tool for facilitating DACs
N
19 General Contact Email or website to contact the team deploying the solution Y
# Component Finland
1 URL https://research.csc.fi/-/fega
2 Credentials
3 FEGA Node Encryption key pair
4 FEGA Node MQ
5 Submitter Inbox SFTP
6 Ingest Pipeline
7 Archive storage POSIX / S3(dev)
8 Archive File Database
9 Backup Storage Tape
10 Main Source Code Repository https://github.com/neicnordic/sda-pipeline
11 Main Programming Language Golang
12 Deployment Method Kubernetes
13 Helpdesk Management System RT - request tracker
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method Central EGA submitter Portal / Planning Developement of own Portal
17 Download Solution https://docs.csc.fi/data/sensitive-data/fega_application/
18 Data Access Tool https://github.com/CSCfi/rems
19 General Contact
# Component Norway
1 URL https://ega.elixir.no/
2 Credentials User: LS Login and CEGA username and password, Services: credentials to connect to CEGA NSS and RMQ endpoints
3 FEGA Node Encryption key pair Generated with crypt4gh tool
4 FEGA Node MQ Y
5 Submitter Inbox REST API in front of POSIX storage. LS Login authentication required. CLI tool provided.
6 Ingest Pipeline neicnordic/sda-pipeline
7 Archive storage POSIX
8 Archive File Database Postgresql
9 Backup Storage Snapshots and tape
10 Main Source Code Repository https://github.com/neicnordic/sda-pipeline
11 Main Programming Language Golang and Java
12 Deployment Method Docker Swarm and PodMan
13 Helpdesk Management System RT - request tracker
14 Helpdesk Portal Technology
15 Monitoring Tool Zabbix
16 Metadata Submission Method Central EGA submitter Portal
17 Download Solution REST API in front of POSIX outbox storage. Files re-encrypted and staged from archive to outbox using neicnordic/sda-doa service. Download requires LS Login authentication.
18 Data Access Tool https://ega.elixir.no/retrieval.html
19 General Contact fega-norway-support@ega.elixir.no
# Component Sweden
1 URL https://fega.nbis.se/
2 Credentials User: LS Login and CEGA username and password, Services: credentials to connect to CEGA NSS and RMQ endpoints
3 FEGA Node Encryption key pair Generated with Crypt4GH tool
4 FEGA Node MQ Rabbit MQ
5 Submitter Inbox S3Inbox
6 Ingest Pipeline NeIC senstive data archive
7 Archive storage S3
8 Archive File Database PostgreSQL
9 Backup Storage S3
10 Main Source Code Repository https://github.com/neicnordic/sensitive-data-archive/
11 Main Programming Language Golang
12 Deployment Method Kubernetes
13 Helpdesk Management System Redmine
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method Central EGA submitter Portal
17 Download Solution
18 Data Access Tool REMS
19 General Contact ega-se@nbis.se
# Component Germany
1 URL https://www.ghga.de/
2 Credentials
3 FEGA Node Encryption key pair
4 FEGA Node MQ Kafka
5 Submitter Inbox Upload Controller Service (UCS) - manages uploads to S3 inbox bucket
6 Ingest Pipeline
7 Archive storage S3
8 Archive File Database MongoDB
9 Backup Storage
10 Main Source Code Repository GHGA - The German Human Genome-Phenome Archive Github Repository
11 Main Programming Language Python
12 Deployment Method Kubernetes
13 Helpdesk Management System Zammad
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method GHGA Metadata Spreadsheet sent via e-mail to the GHGA Data Steward
17 Download Solution GHGA Connector and Download Controller Service
18 Data Access Tool GHGA Data Portal and Access Request Service
19 General Contact GHGA Operations Team <contact@ghga.de>
# Component Spain
1 URL https://fega-test.bsc.es/docs/
2 Credentials
3 FEGA Node Encryption key pair
4 FEGA Node MQ
5 Submitter Inbox SFTP
6 Ingest Pipeline
7 Archive storage GPFS
8 Archive File Database
9 Backup Storage
10 Main Source Code Repository https://github.com/EGA-archive/LocalEGA
11 Main Programming Language Python
12 Deployment Method Docker-Compose
13 Helpdesk Management System RT - request tracker
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method Central EGA submitter Portal
17 Download Solution Nextcloud
18 Data Access Tool
19 General Contact https://fega-test.bsc.es/docs/contact.html
# Component Portugal
1 URL
2 Credentials
3 FEGA Node Encryption key pair
4 FEGA Node MQ
5 Submitter Inbox SFTP
6 Ingest Pipeline
7 Archive storage
8 Archive File Database
9 Backup Storage S3
10 Main Source Code Repository
11 Main Programming Language Python
12 Deployment Method Docker-Compose
13 Helpdesk Management System
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method Central EGA submitter Portal
17 Download Solution SFTP
18 Data Access Tool
19 General Contact Jorge Oliveira <cto@biodata.pt>
# Component Poland
1 URL
2 Credentials
3 FEGA Node Encryption key pair
4 FEGA Node MQ
5 Submitter Inbox
6 Ingest Pipeline
7 Archive storage
8 Archive File Database
9 Backup Storage
10 Main Source Code Repository https://github.com/neicnordic/sda-pipeline
11 Main Programming Language Golang
12 Deployment Method Kubernetes
13 Helpdesk Management System
14 Helpdesk Portal Technology
15 Monitoring Tool
16 Metadata Submission Method Central EGA submitter Portal
17 Download Solution
18 Data Access Tool
19 General Contact biobank@uni.lodz.pl

Hear more details about node implementations:

Node experiences

Click on a node below to read more about their experiences in the FEGA onboarding journey!


Setting up and performing the end-to-end demonstrator

FEGA Finland Author(s): Francesca Morello, Laura Kalliokoski and Heikki Lehväslaiho

What did we do?
We planned the submission in advance and created a scripted framework to guide us throughout the process.

What went well?
The demonstration itself proceeded smoothly, providing us with valuable insights and knowledge. During the submission, we made the decision to submit BAM files, even though we were not initially familiar with the process, and this decision allowed us to learn and understand the submission process better. Despite some challenges encountered while using the user interface to add the data, the overall experience was instructive and productive. Furthermore, our collaboration with other nodes who shared their scripts and resources significantly contributed to improving our demonstration and workflow. This exchange of knowledge and resources was instrumental in enhancing the overall success of the demonstration. Additionally, having the demonstrator take place simultaneously with other helpdesks was a positive aspect, providing a unique opportunity for practical collaboration and tasks that we don't typically engage in with other helpdesks. This shared experience fostered valuable interactions and enriched our collective understanding.

What could have gone better?
Our decision to submit BAM files, while valuable for learning, significantly complicated the demonstration and extended its duration over the entire day. Consequently, we ended up testing the technical workflow, helpdesk support, and collaboration with Central EGA simultaneously, which could have been divided into distinct stages for a more manageable process. In hindsight, it might have been beneficial for all nodes to use the same script, facilitating smoother coordination and consistency across the demonstration.

What did we learn?
Our experience underscored the significance of expectation management, as the submitters participating in the demonstrator had to wait over a year before the service was in production. This delay has emphasized the importance of setting realistic expectations and timelines for project milestones and ensuring effective project management for a more efficient progression towards production.

FEGA Poland Author(s): Krzysztof Kochel

What did we do?
After we had established a test environment and validated success stories our Help Desk team started to pass user journeys described in the "Federated EGA node end-to-end demonstrator". Each ambiguity or understatement was written down and sent to the appropriate person in the Central EGA Helpdesk to clarify. When we had confirmed that members of Helpdesk understood each step of each journey, we performed several trials with fresh accounts. This approach allowed us to detect issues which a new user may encounter.

What went well?
Repeating all the journeys several times with fresh accounts allowed us to detect many issues which would lead to a failure of the final demo.

What could have gone better?
In some cases we misunderstood explanations. However, those cases were detected and corrected by meeting participants from the Central EGA side.

What did we learn?
We should point out all the edge cases and confirm with Central EGA.

FEGA Sweden Author(s): Mattias Strömberg and Markus Englund

What did we do?
We started to plan for our end-to-end demonstrator a couple of months before the actual event. During that time, the whole team worked hard to understand the steps and to identify technical and organizational issues that remained to be resolved. The planning resulted in the following:
  • a play script for the node's actual demonstrator
  • identified roles and named individuals to play the different roles
  • a test dataset (no personal data) with made-up metadata
  • text templates to use in the communication during the demonstrator
  • instructions for the local helpdesk team
One general rehearsal was then performed a few days before the actual event. The actual demonstrator event took place on a single day in January 2022.

What went well?
The demonstrator event went smoothly without any major issues. It would probably not have been as successful without the meticulous planning and the strong engagement from the people at the node and at Central EGA. We also had much help from other nodes, like the Norwegian node that for example shared a draft of their play script.

What could have gone better?
The systems we used in the demonstrator for communication and handling of the submission and data access request (e.g. email, Slack and Redmine) involved many manual steps. There might have been better systems, or we could have configured the systems differently. It was not ideal to have the general rehearsal only a few days before the event. We were lucky though that no major issues turned up. We recommend aiming for a rehearsal at least one week before the demonstrator. In the play script, we should have described the journeys from an end user's perspective already from the start.

What did we learn?
It is important to start the planning for the demonstrator well in time before the event. Take inspiration from other nodes to see how they are thinking and which solutions they have. Establish a dialogue early with Central EGA - their feedback is critical for success.


Shaping up the FEGA node to prepare for production

FEGA Norway Author(s): Kjell Petersen

What did we do?
For many years we worked on technical solutions at the software level and a growing feeling in our ELIXIR node was emerging, we are not progressing satisfactory towards a production state. Lately, much more guidance is available on a path towards such a goal, in particular how to join the FEGA network and how your node should communicate and interact with the FEGA entities/committees to join. But not that much on how to organize your node internally to best meet the FEGA requirements, and hence this experience sharing is a key factor for us in this process.

What went well?
After having most of the technical solutions in place and tested, we answered the question "which organization shall be the Service Provider" and hence a key legal entity in the operations. Fixating this single decision, made it possible to know exactly which organisation's internal procedures and internal functions we had to relate to when adhering to GDPR, when developing many of the centrally required assets to progress towards a production-level node, including:
  • ROS
  • DPIA
  • DPA
  • SOPs (referred to in the 3 above)

What could have gone better?
Having a better overview from the start would have helped us plan better our time, and improved communication with the right people.

What did we learn?
Should have started earlier with a clear decision on which organisation will be Service Provider, and not delay this crucial point too late in/after the technical development.


Selecting submission pilots

FEGA Finland Author(s): Francesca Morello, Laura Kalliokoski and Heikki Lehväslaiho

What did we do?
The Finnish node's experience differed from the Swedish node's approach. In our case, we did not proactively choose a pilot project; instead, researchers and users approached us seeking specific services critical to publishing. These services were instrumental in enabling them to publish their research papers and secure funding for their ongoing studies. These researchers faced a unique challenge: while their dataset had been consented for research use and reuse, strict restrictions prevented its transfer outside of Finland. The Finnish Federated EGA service and their integration with CSC's services (SD Apply and SD Desktop) played a pivotal role in overcoming this obstacle. By ensuring that no additional copies of the data were created and making it accessible only via a secure virtual desktop environment, Federated EGA became natural and often the only possible solution for the researchers. While we received inquiries from numerous research projects, we faced the challenge of managing expectations due to the absence of a clear timeline for the service availability. Consequently, we had to make difficult decisions and, unfortunately, had to decline some requests to ensure the effective allocation of our resources.

What went well?
During our collaboration with one of the research groups in the pilot phase, we established effective communication channels, including face-to-face meetings, which allowed us to understand their specific needs better. These meetings proved invaluable as they facilitated comprehensive testing of data uploads, a process that sometimes required additional support next to the technical documentation. We also worked closely with the researchers to establish legal agreements with the data controller, in this case, the university hospital. This partnership was crucial in navigating the legal complexities surrounding data usage. This specific study, involving single-cell RNA sequencing of human cells, posed minimal data size constraints, with data totalling approximately 20 GB. This relatively small dataset size, coupled with the absence of intricate phenotype data, simplified the data submission process and contributed to the overall success of our collaboration.

What could have gone better?
One of the primary challenges we encountered was the lack of a clearly defined timeline or the frequent postponement of timelines throughout the pilot. This uncertainty created significant frustration for both the researchers and our team. Researchers, whose careers and funding were contingent on the pilot's progress, found it especially challenging to understand why the process was often delayed. The absence of a predictable timeline made it difficult for us to manage expectations effectively and communicate transparently about project milestones and progress. In hindsight, having a more structured and consistent timeline could have mitigated these issues and improved the overall experience for all parties involved.

What did we learn?
The overall experience was positive, and it served as a valuable learning opportunity for our team. However, reflecting on our experience, we recognize that there were areas where we could have improved. Better planning and coordination across the nodes, as well as with the Central EGA, would have greatly benefited the pilot. Aligning timelines and needs between all parties involved, including researchers, the FEGA nodes, and the Central EGA, could have led to a more streamlined and efficient process. In hindsight, leveraging the FEGA Operations Committee could have played a pivotal role in addressing challenges collectively and finding solutions collaboratively, aligning our efforts more effectively and ensuring smoother and more productive collaborations.

FEGA Sweden Author(s): Markus Englund

What did we do?
The Swedish node selected SweGen as its first submission pilot project a few years before the federation was officially established. This project was chosen because staff at the Swedish ELIXIR node had been engaged in it and because the data was considered a good genomic reference for the Swedish population.

To avoid relying on a single pilot dataset, the node eventually decided to engage with two additional projects. At that point, the node had gained a better understanding of what a good pilot project could look like. A few candidates were selected among projects that had expressed interest in depositing data at the node. Semi-structured interviews were then held with two candidate projects before they were officially selected. The local helpdesk team (at the time consisting of only two persons) was responsible for the selection process, but the final decision was made at FEGA node's management level.

What went well?
For the semi-structured interview, the local helpdesk team created a questionnaire. This allowed the node to collect necessary information before pilots were selected and made it easier to perform the evaluation. Asking the questions was in itself a good way to inform the candidates about the node's expectations. The questionnaire included questions related to for example data availability, dataset details (e.g. submission type, file types and file sizes), legal matters (e.g. ethical permit and data processing agreement) and information about people that needed to be involved (e.g. their roles and their availability).

What could have gone better?
Having a strategy already when selecting the first pilot would probably have made the node's work more efficient. It would also have made it easier for the node to communicate its expectations to the people that represented the candidate projects. If we had selected pilots now, we would probably have selected three pilot projects already from the start.

What did we learn?
Good communication of expectations is key to success. It is also crucial that the people you engage with have the motivation, patience and enough time to dedicate to the work.


Establishing data processing agreements with data controllers

FEGA Finland Author(s): Francesca Morello, Laura Kalliokoski and Heikki Lehväslaiho

What did we do?
In Finland, the Federated EGA is hosted by CSC - IT Center for Science. The landscape here is marked by a diversity of data controllers, predominantly from academic organizations, university hospitals, and biobanks. To streamline and facilitate the data submission process for researchers, we have initiated discussions with all data controllers involved to have standardized DPAs. The goal is to establish comprehensive DPAs that encompass the necessary legal requirements while simplifying the process for those wishing to deposit data with the Federated EGA. This collaborative approach aims to provide researchers with a smoother path for sharing their data within the boundaries of legal compliance.

What went well?
In addition to supporting and receiving support from researchers throughout this process, we have also successfully raised awareness within their organizations. This proactive approach has allowed us to familiarize these entities with the legal requirements and processes of FEGA, including their involvement in establishing Data Access Committees (DACs) where necessary. By promoting a broader understanding of the service's operational framework and legal compliance, we've not only facilitated smoother interactions with researchers but also enhanced overall transparency and cooperation among the organizations involved.

What could have gone better?
The discussions with some experts and organizations proved challenging, primarily due to the diverse expertise required to address various aspects of the agreements. The need to navigate these complexities in a more streamlined manner became evident, and we recognize the importance of finding more efficient and structured ways to engage and collaborate with these experts in the future. Moreover, the uncertainty with the timelines also had an impact on the process. This uncertainty occasionally led to delays and hindered the pace of progress. Sharing best practices and experiences with our counterparts in other nodes might have provided valuable insights and strategies to address these challenges more effectively.

What did we learn?
Our key takeaways encompass the critical importance of organizational support, the necessity of hiring or involving domain experts, and the strategic allocation of dedicated time and resources for efficient management of this process.

FEGA Sweden Author(s): Markus Englund

What did we do?
In order to process personal data in Sweden you must comply with the EU General Data Protection Regulation (GDPR). Since the Swedish FEGA node is formally hosted by Uppsala University, this means that a data processing agreement is needed whenever data comes from a different legal entity. To make it easier for Swedish researchers to submit data, the node decided to set up general data processing agreements with the country's major universities.

What went well?
When a researcher wants to submit data to the Swedish node, he or she generally doesn't need to sign a separate data processing agreement with Uppsala University, which would have been needed if there were no general data processing agreements. Because the node has used the same template for all its general data processing agreements, all the agreements now basically look the same.

What could have gone better?
It took the node several years to sign agreements with all the major Swedish universities. The reason for this was mainly the discussions that needed to happen with the legal experts at the different universities. It is hard to say what the node could have done differently.

What did we learn?
General data processing agreements may require the node to develop additional operating procedures. If the data providing organization must sign a data processing agreement whenever a new dataset is deposited at the node, this will make it obvious who the data controller is for that dataset. If a general processing agreement is used instead, the node's local helpdesk will have to verify the data controller by other means.


Setting up the node’s website

FEGA Sweden Author(s): Markus Englund

What did we do?
We created a separate website for the Swedish node and its services. We decided to switch from Jekyll to Quarto, which at the time was a fairly new tool for e.g. generating static websites.

What went well?
Using Quarto made it very easy for the helpdesk team to structure the website and fill it with content. All the pages were written in markdown, which also made the website easy to maintain. The website was hosted on GitHub pages with a GitHub action to render the website whenever new changes were incorporated into the main branch.

What could have gone better?
The helpdesk team spent a lot of time on developing the content of the website. A cookie-cutter template could have made it easier for us. We were also missing graphical guidelines for how to make the website align with other nodes' websites but still make it visually distinguishable from https://ega-archive.org.

What did we learn?
Learning a tool like Quarto felt like a good investment if you want to quickly create a nice-looking and highly configurable website. We also learnt that the software RStudio Desktop works well for maintaining the markdown files.


3. Evaluate your implementation

  • Understand the domains in which a node matures using the Federated EGA Maturity Model
  • Assess the technical and operational maturation of your node by doing a self-assessment against the Federated EGA Maturity Model
  • Demonstrate the full set of node services for users by planning your Federated EGA end-to-end demonstrator
  • Determine compliance of services with FEGA specifications by performing compliance tests (Coming soon!)
  • Evaluate ability to scale services by performing stress tests (Coming soon!)

4. What’s next?

To hear up-to-date progress of FEGA and discussions with existing and interested FEGA nodes in the ELIXIR Federated Human Data (FHD) Community, join the ELIXIR FHD Mailing List (select “Human Data”) and attend the ELIXIR FHD Community Calls.