Interoperability in Healthcare: Collaboration via OMOP-CDM for Standardization

04/01/2024

João Fonseca is a skilled Data Engineer and Scientist. Holding an MSc in Bioengineering from FEUP, specializing in Biomedical Engineering, his proficiency spans the health data spectrum. He adeptly navigates the entire health data pipeline, demonstrating expertise in data collection, harmonization, engineering, and comprehensive scientific analysis. João's passion lies in leveraging his multidisciplinary background to unlock insights and innovations within the intricate realm of biomedical data.

João Fonseca, Data Engineer and Scientist, marks the first interview of the series.

Interoperability in Healthcare: Collaboration via OMOP-CDM for Standardization

Interoperability in Healthcare: Collaboration via OMOP-CDM for data Standardization

What is your experience in working with standardized healthcare data models like OMOP-CDM?

My exposure to standardized health data commenced during my MSc thesis, focusing on testing an AI mortality prediction model against the current standard. I uncovered a major pitfall in healthcare analytics: conducting comparisons without identical data sets. To mitigate this, I utilized common MIMIC-III biosignal data. However, scarcity of data and researchers' alterations posed challenges, hindering standardized analytics.

That is where OMOP-CDM comes in.

Since June 2022, I've closely engaged with OMOP-CDM and discovered its vast community involvement. Developers, clinicians, tool creators, companies, and data custodians actively contribute, fostering widespread adoption and refinement. This community-driven growth facilitates standardized analysis, larger datasets, and multi-center studies, addressing the issues from my thesis and enhancing both research quality and my expertise in healthcare data utilization.

What initially drew your interest towards working as Data Engineer in healthcare, and why do you find this area of data engineering compelling?

Healthcare has lagged in embracing recent technological advancements, evident in the process of scheduling consultations or accessing treatments feedback. Booking appointments often meant enduring long queues for manual data entry.

These problems permeate patient experiences and broader healthcare trends. Implementing comprehensive data systems could revolutionize healthcare operations and analysis.

This drove my pursuit into healthcare data—a domain ripe for transformation. Enhancing both organizational and analytical data systems holds the promise of reshaping our understanding and approach to healthcare delivery and research.

Data Transformation and Standardization

How does Promptly convert diverse healthcare data formats into the standardized OMOP-CDM for analysis and what challenges does Promptly face during the data transformation process, and how are these challenges addressed?

The main purposes of using OMOP-CDM at Promptly are reusability and scalability.

The process involves profiling source datasets to understand their structure and then implementing an extract, load, and transform process to align them with an OMOP-CDM common model.

Notably, this transformation involves standardizing vocabularies used in the data, translating diverse concepts like "Cardiac disease" into standardized coding systems like SNOMED or ICD10. OMOP-CDM relies on the Athena vocabulary as a source for these codes, fostering continuous community efforts to update and define the primary source of truth for medical concepts.

Promptly faces a major challenge in harmonization, addressed through a tool like USAGI, which suggests standard concept codes automatically, refined by clinical specialists. Efforts are underway to develop a more efficient AI-based tool leveraging vast medical concepts in the source database, retraining the model to improve vocabulary mappings and reduce manual labor.

This unified source of truth enables reusability and scalability by facilitating mapping reuse across clients and enhancing the tool's efficiency over time.

Data Quality Assurance

What measures does Promptly employ to ensure the quality and integrity of data when adopting the OMOP-CDM Common model?

Promptly employs various measures to safeguard the quality and integrity of data when embracing the OMOP-CDM framework. Initially, adherence to the fundamental rules of OMOP-CDM concerning structure and vocabularies is a primary focus to ensure data quality.

Subsequently, ensuring the efficacy of Extract, Load, Transform (ELT) processes involves integrating data quality tests at every stage, commencing from the source data through to the analytical dashboards provided to clients.

These tests are implemented on a granular level, starting with column-specific evaluations, progressing to table-level assessments, and culminating in high-level unit tests. These comprehensive tests enable comparisons between the source data and the data presented in the analytical dashboards, ensuring accuracy and reliability.

Analytical Capabilities

How does the OMOP-CDM standardized model enhance the efficiency or accuracy of analytical processes?

Utilizing a standardized model not only enables standardized analytics but also facilitates comparisons among various analytics outcomes, significantly enhancing the value of research and real world evidence (RWE).

Moreover, adopting such a model allows for the potential of conducting multi-center studies. When multiple data partners harmonize their data to OMOP-CDM, aggregated analyses become feasible. This broader pool of data used in analyses encompasses a larger segment of the population sample. Consequently, conclusions drawn from these studies tend to align much more closely with reality.

Interoperability and Collaboration

How does Promptly facilitate interoperability and collaboration among different stakeholders or organizations using the OMOP-CDM?

One of the main advantages of harmonization towards a Common Data Model is in fact, collaboration. We can think of it as stakeholders and organizations speaking the same language rather than their own. That also facilitates the use of the same tools, analysis, and interoperability in general.


Promptly plays a crucial role in this ecosystem by establishing a real world evidence network underpinned by the OMOP-CDM framework. Promptly steps in to provide standardized analysis tailored to the needs of each organization. Acting as an intermediary, Promptly connects with each entity, conducting thorough analyses without directly accessing or taking the data out of the organization’s premises. This is achieved through a sophisticated Federation model, ensuring data security and integrity while fostering seamless collaboration.

Adaptability and Scalability

How adaptable is Promptly's approach when incorporating new data sources or updates to the OMOP-CDM standards?

OMOP-CDM offers notable benefits in terms of reusability and scalability. Through streamlining vocabulary mapping, employing a standardized ELT (extract, load, and transform pipeline), and using standardized analytics, the tool demonstrates exceptional adaptability to different clients.

The remaining task involves analyzing the client-specific data model and establishing the "load" aspect of the pipeline.

Privacy and Security Measures

What strategies does Promptly employ to ensure patient privacy and data security while working with the OMOP-CDM?

The OMOP-CDM serves as a pseudo-anonymized data model by not retaining identifiable patient or clinician information, constituting its primary privacy measure.

In the harmonization phase, the pipeline additionally applies a hashing algorithm to the patients' and clinicians' IDs in the initial database. This step guarantees the impossibility of tracing back patient data from the OMOP-CDM to the original database, which might contain personal information for administrative needs.

Learn more about Promptly Data Solutions