Data ingestion and harmonisation
DataTools4Heart will develop a common data extraction tool to improve metadata and data interoperability while addressing data heterogeneity across European regions and cardiology units. This tool will be developed and validated through a modular and flexible Data Ingestion Suite deployed in 7 European sites. Interoperability of the Data Ingestion will be guaranteed with at least 4 standard-based data models (HL7 V2, HL7 CDA, OMOP CDM, and i2B2) and tested in 3 different use cases for AI modelling.
read moreNatural Language Processing
DataTools4Heart will introduce a multilingual Natural Language Processing (NLP) suite to standardise the structuring of cardiology reports across European regions, including cardiology-specific entity recognition and machine translation. Such suite will include adaptation of 7 language models to the cardiology domain in English, Spanish, Italian, Romanian, Czech, Swedish, and Dutch using EHR data from clinical site partners. The project will include the release of clinical multilingual corpora (CardioSynth and Paraclite) in 7 languages, with over 50% being low-resource and containing more than 500,000 words of clinical text.
read moreFederated machine learning and data synthesis
With the aim to develop innovative methods for synthesising data, DataTools4Heart will build a privacy-preserving cardiology data toolbox to improve data reusability, while adhering to ethical and legal standards. Differentially private generation of synthetic data will allow to handle data which are representative of a target population, scalable and shareable for research purposes. DataTools4Heart will leave, as legacy, the creation of an open-source privacy-conscious synthetic dataset, CardioSynth.
read more