Upcoming Events | Where to Meet Promptly

Know more

Are We There Yet? Preparing Health Data for the Age of AI - An Interview with Paulo Ferreira

06/05/2025

Paulo Ferreira, Medical Data Lead at Promptly Health.

Holds a Medical Degree from the Faculty of Medicine of the University of Porto (FMUP) and an MBA in Management of Healthcare Organisations and Services from Universidade Fernando Pessoa.

Currently serves as Medical Data Lead at Promptly Health and as a Resident Physician at ULS São João. Former Assistant in Health Data and Digital Health courses at FMUP, with a strong focus on clinical data quality, digital health innovation, and privacy-preserving data science in healthcare.

🔹Defining "AI-Ready" Health Data

What does it really mean for health data to be "AI-ready"?

AI-ready health data is data that is structured, standardized, complete, and contextually rich. It means the data can be used to train models without significant cleaning, reformatting, or interpretation.

This whole process if one of the bottlenecks indeed, we are floating in data but limited on using it, it’s just lives there, like a gold mine waiting to be mined

Can you give an example of data that looks ready but isn’t usable for AI purposes?

Yes, one common scenario is data exported from an EHR that’s complete but entirely unstandardized. You’ll have dozens of free-text entries or locally coded fields that mean different things in different hospitals. On paper, the data is "there" — but without harmonization, it’s not analytically useful.

What are the technical and clinical criteria we look for when assessing data readiness?

We look at data quality indicators like completeness, consistency, granularity, and timeliness, no big science here. Clinically, we ensure that variables align with real-world care pathways and that metadata — like timestamps and coding systems — are well-documented. If a dataset lacks clinical depth or is heavily fragmented, it limits its value for research using traditional statistics or AI methods. We get suspicious on everytime we have a 157 years pregnant male.

🔹Current Barriers and Bottlenecks

What are the biggest technical and organizational challenges to making clinical data usable for AI?

Who knows where that lab result is? How to query that Database? What’s the coding system used? It changed overtime? Multiple small problems make great difficulties so i would say that The biggest challenge is fragmentation — across systems, formats, and governance models. Technically, we deal with incompatible databases and lack of standard vocabularies. Organizationally, there's often limited data ownership or resources dedicated to quality improvement

How does working across multiple hospitals or health systems complicate data preparation?

Each institution has its own legacy systems and data culture. Variables may have different meanings, units, or coding conventions. It’s not just an ETL (Extract, Transform, Load) problem — it’s about aligning interpretations and clinical context across multiple actors. This is where our role at Promptly becomes essential.

🔹Harmonization, Interoperability & Standards

How important is data harmonization when it comes to training reliable AI models?

It’s foundational. Without harmonization, AI models are prone to bias and overfitting. They might work in one hospital but fail completely in another. Harmonized datasets improve generalizability and allow AI systems to scale across different healthcare settings.

How does Promptly help clients overcome interoperability challenges?

At Promptly we believe that no Data should be left behind, every piece of information should be feeding our pipelines. We implementes data pipelines that map source data into international standards like OMOP and HL7 FHIR. Beyond the technical layer, we also engage clinical stakeholders to ensure semantic interoperability. We’ve developed products that allow us to extract, validate and map concepts from structured and unstructured data from our data partners. In this way, in addition to collecting data directly reported by patients, we are able to rapidly scale the amounts of data available in our “Final DataSet”,

🔹Privacy, Security & Ethical AI

What are the main privacy risks when working with AI and health data?

Proprietary AI models or the sharing of information through conventional methods (spreadsheets on some google drive) call into question basic data security principles. We need federated access to this data.

How does our Secure Data Environment and Federated Learning help mitigate those risks?

Our Secure Data Environments ensure that data never leaves the organization. We bring computation to the data, not the other way around. With federated learning, we can train models across multiple datasets without ever centralizing the data — preserving privacy while enabling innovation.

🔹Real-World Applications & Learnings

Can you share a success story where well-prepared data enabled meaningful AI insights?

Yes, we have projects in which we have used AI tools to enable us to enrich Datasets that have subsequently been used for exploratory models for predicting health outcomes, in particular mortality from cardiovascular causes.

What are the first steps healthcare organizations can take to make their data AI-ready?

Start with governance and quality. Establish data stewards, define clinical pathways of interest, opt for EHRs with structured data formats and semantic coherence in the various areas and invest in standardization. Even small steps — like mapping local codes to SNOMED — can make a huge difference in the long term.

How can data teams and clinical teams work together to improve outcomes?

It starts with co-design. Clinicians understand the context, while data teams bring structure and analytics. When they collaborate from day one, we get better questions, better models, and more actionable insights.

What excites you most about the future of AI in healthcare — and where does Promptly fit into that future?

What excites me is the shift from retrospective analysis to real-time, proactive care — using AI to intervene before adverse events happen. At Promptly, we’re building the trusted data infrastructure to support that shift. We’re not just preparing data — we’re enabling the future of medicine.


“If we get the data infrastructure right, AI won’t replace researchers— it’ll empower them.”


💡Useful links of your interest:

https://promptlyhealth.com/en/solutions/data-harmonization

https://promptlyhealth.com/en/evidence-science

https://promptlyhealth.com/en/contact-us