Understanding the Data
Hagimo's teams have been working with healthcare data companies for over 20 years. Our
experience and relationships can be an invaluable resource for your organization when it comes
to identifying the types of data you need to enable your business processes and locating the
vendors who have it. The healthcare data we work with is generally divided into three types:
- Health Histories. Data directly associated with the diagnoses and procedures history of an individual, or group of individuals — claims data, EMR / EHR data, pharmacy data, lab data, imaging data, and other clinical data. This is the richest, most tightly regulated, and most valuable class of healthcare data, and it is the focus of the de-identified claims data section below.
- Industry & Public Data. Government, registry, and institutional resources — CMS, CDC, NIH, WHO, HHS, provider directories, and registries — that bring scale and context to analytics built on patient data.
- Support & Reference Data. The coding systems and controlled vocabularies (ICD, CPT, HCPCS, NDC, SNOMED, LOINC, and more) required to normalize, link, and interpret everything else.
De-Identified Claims Data
De-identified medical and pharmacy claims data is the backbone of real-world evidence, health economics, market access, and litigation analytics. Hagimo sources data from effectively every major licensor in this market — and we know how each one prices, packages, restricts, and approves access to it. Becoming an approved data buyer is genuinely difficult: every source requires a negotiated data license agreement, a defensible use-case attestation, and ongoing compliance monitoring. Without the use-case expertise to get through that door, simply trying to license this data directly is, in practice, close to futile. That is where we come in.
Two structural categories that shape every dataset
Originated at the clearinghouse
Captured as the claim is submitted by a provider, before the payer's final adjudication. Very large patient counts (300M+ lives) and near-real-time timing, but fragmented to the providers and pharmacies feeding a given clearinghouse, and lacking confirmed payment.
Sourced from the payer
Reflect full adjudication and payment across a member's enrolled period. Complete and payment-confirmed, but limited to enrolled lives and typically lag three to six months. The right fit wherever confirmed payer payment matters.
Blending the full patient journey
The most complete views blend open and closed claims through privacy-preserving tokenization, with Datavant serving as the dominant linkage layer across the industry. Standardizing on it lowers integration friction across sources.
Safe Harbor or Expert Determination
De-identification is performed under HIPAA Safe Harbor or Expert Determination, with patient identifiers tokenized for longitudinal linkage. The method chosen affects the linkage and re-identification-risk controls you inherit — a key diligence item.
Major De-Identified Claims Data Licensors
Hagimo sources de-identified claims data from each of the vendors below on behalf of our clients — the larger players that license this data to commercial buyers under a data license agreement with use-case attestation. We know how each one packages, restricts, and approves access, and we secure it for you.
| Vendor | Data Type / Coverage |
|---|---|
| IQVIA | Closed and open claims, Rx (LRx ~4B scripts/yr, ~92% coverage), EMR, and remittance / 835 data. Broadest patient base, including Medicare. Regulatory-grade; the E360 platform spans 1B+ records. |
| Komodo Health | All-payer (open + closed) claims; 330M+ de-identified patient journeys. Strong for rare disease, RWE, and HEOR. MapLab / MapAI platform. Holds a CMS Innovator's License. |
| Merative (MarketScan) | Closed commercial, Medicare Supplemental, and Medicaid claims since the early 1990s; 135M+ unique individuals and 200M+ lives across years. Tokenized via Datavant. |
| HealthVerity | Marketplace aggregator. Among the most extensive open claims from the largest US clearinghouses, plus closed claims from 150+ payers. Lab (Labcorp / Quest), chargemaster, and SDOH. Inovalon preferred partner. |
| Inovalon | Largest closed-claims source in the US; all-payer dataset (Medicare Advantage, 100% Medicare FFS, commercial); 454M+ unique lives and 97B+ medical events. Also operates an EDI clearinghouse. |
| Datavant | The linkage / tokenization layer plus the Switchboard marketplace; 500+ real-world data partners and 60M+ records moving across the network. The de facto connective tissue for blended assets. |
| Clarivate (DRG) | Open and closed claims, blended and patient-mastered across sources into analytic-ready repositories. Strong QA and normalization. |
| FAIR Health | Independent nonprofit; the FH NPIC private-claims database drawn from payors nationwide, plus Medicare. Licenses de-identified aggregated datasets for commercial, policy, and academic research — among the more accessible licensors. |
| Veradigm | NLP-enriched claims plus EMR cohorts; license by therapeutic-area cohort, custom cohort definition, or full network EHR dataset. Refreshed nightly / weekly / monthly. Often cited as regulatory-grade. |
| Symphony Health (ICON) | Integrated Dataverse: medical and Rx claims plus prescriber-level data. A common IQVIA alternative. |
| Truveta | Health-system-sourced EHR plus linked claims, mortality, and SDOH; emerging for regulatory use, with site-level feasibility strengths. Consortium model. |
Clearinghouse-Native Open Claims Sources
These entities originate open claims as a byproduct of transaction processing. In practice their data most often reaches buyers through the aggregators above, but several license or contribute directly.
| Clearinghouse / Source | Notes |
|---|---|
| Optum / Change Healthcare | One of the largest open-claims contributors in the US (Change now under Optum / UnitedHealth Group). Optum also licenses its own de-identified clinical and claims assets directly. |
| Waystar | Major clearinghouse contributing open-claims volume to the market; data typically surfaced through aggregators. |
| Inovalon | Operates an EDI clearinghouse and is also the largest direct closed-claims licensor (see above). It sits in both categories. |
| Availity | Large provider-owned clearinghouse with a significant open-claims footprint. |
| Top US clearinghouses | Aggregated marketplace open-claims coverage represents three of the top four US clearinghouses, totaling 300M+ patients — the practical route to aggregated clearinghouse data. |
One relationship, the whole market. Vendors in this space do not publish rate cards — pricing, coverage, refresh cadence, and permitted use are all negotiated per engagement and driven by therapeutic scope, history depth, record volume, and whether tokenized linkage is included. We know these vendors, their priorities, and what they will and won't approve, so you don't have to learn it deal by deal.
How Hagimo turns this landscape into your advantage
Broker the deal
We leverage relationships built since 2006 to get your data ask in front of the right people at the right vendors — often where others simply can't.
Build the use case
The specificity and accuracy of your use-case attestation decide whether you are approved. We craft the winning case and go acquire the data on your behalf.
Curate the data
We match sources, manage collision rates and column densities, blend open and closed claims through tokenized linkage, and normalize it into analytic-ready form.
Run the analytics
From mapping to your existing models to delivering finished business intelligence, we can take the data all the way to insight — we can do it all.
Industry & Public Data
In addition to health histories, many other data resources are directly applicable to building
analytics and business intelligence across medical data sets. Some of these include:
|
|
Support & Reference Data
All of the disparate processes and systems that comprise the U.S. (and global) health system are necessarily complex, with many different entities curating different portions of the whole. To render actionable business intelligence based on these resources, it's necessary to refer to numerous external data resources to bring context to any analytics. Some of this data includes:
|
Hagimo has been working exclusively with healthcare data for over 20 years. It's a complex
landscape, but the power and insights to be gained from it are immense — and the hardest
part is rarely the analytics. It's getting approved, getting the data, and getting it into a
form you can use. We broker the deal, build the use case, curate the data, and run the analytics.
Let us put this wealth of information to work for your company.
Contact Hagimo for a consultation on the data sets that fit your model: inquiries@hagimo.com | (844) 247-6973.
