Civic Sample - Clinical Trials Demographics Dashboard

Total Studies

Report Race

Report Ethnicity

Report Both

Reporting Trends Over Time

Total Participants with Reported Race Data

Shows the total number of participants with explicitly reported race data per year (excludes "Unknown" and studies without race data).

Full Distribution with Data Quality

Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.

Race Distribution (NIH/OMB Categories)

Race Over Time

Proportion among studies that reported race (excludes missing and unknown).

Total Participants with Reported Ethnicity Data

Shows the total number of participants with explicitly reported ethnicity data per year (excludes "Unknown" and studies without ethnicity data).

Full Distribution with Data Quality

Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.

Ethnicity Distribution

Note: The "Unknown or Not Reported" category is large because many studies do not collect ethnicity data or participants decline to report. This reflects limitations in data collection practices across clinical trials.

Ethnicity Over Time

Proportion among studies that reported ethnicity (excludes missing and unknown).

Disclaimer: This is currently an unstructured approach to wrangling this data, presented for demonstration purposes. We are refining this approach and expect to have a preprint detailing our methodology available by June 2026.

Total Participants with Reported Sex Data

Shows the total number of participants with explicitly reported sex data per year (excludes "Unknown" and studies without sex data).

Full Distribution with Data Quality

Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.

Sex Distribution

Sex Ratio Over Time

Proportion among studies that reported sex (excludes missing and unknown).

Total Participants with Reported Gender Data

Shows the total number of participants with explicitly reported gender identity data per year (excludes "Unknown" and studies without gender data).

Full Distribution with Data Quality

Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.

Gender Distribution

Note: Gender identity is rarely reported separately from biological sex in clinical trials.

Gender Over Time

Proportion of reported gender identities across years (excludes missing and unknown).

Study Details

Click on any NCT ID to view the full trial on ClinicalTrials.gov. Click column headers to sort. Click checkmarks (✓) to view detailed demographic breakdowns.

Loading study data...

Study Start Date	Study End Date	NCT ID	Details	Title	Time to Report	Race	Ethnicity	Sex	Results Posted	Last Update	Study Status	Participants	Publications	Lead Sponsor	Phase	Type	Design	Primary Endpoint

Map Layer:

Total Trials

Single-site

Multi-site

Location Not Reported

Trials by US State

Click a state to see city-level breakdown. Darker colors indicate higher values.

100

Regional Distribution

Trials by US Census Region

Site Distribution

Breakdown of trials by number of sites.

Geographic Reporting Over Time

Percentage of trials reporting location data by year.

FDA-Authorized AI/ML-Enabled Medical Devices

Data source: FDA Artificial Intelligence-Enabled Medical Devices

Last Updated: March 5, 2026

This list identifies AI/ML-enabled medical devices authorized for marketing in the United States. Demographic data for these devices is not reported by the FDA; the dashboard displays 100% "Not Reported" for all demographic categories.

Devices by Medical Panel

Authorizations Over Time

Date	Submission #	Device	Company	Panel	Product Code	Application

(Beta) AI Demographic Extraction

Uses Claude to extract demographic data from FDA 510(k)/De Novo/PMA summary PDFs. This pipeline identifies how often race, sex, and age are reported in AI/ML device clinical validations.

Token Usage & Scaling Projection

Pilot Run — avg input tokens/doc

Pilot Run — avg output tokens/doc

Pilot Size — documents processed

Remaining Docs — to be processed

Projected Scaling Cost — Sonnet 4.6 ($3/1M in, $15/1M out)

Total Token Estimate — input + output

Demographic Reporting Frequency

Extracted FDA Demographics (Pilot Sample)

Fields marked Not Reported highlight the gap in FDA demographic disclosure.

Device Name	Panel	Submission #	Total Participants	Male	Female	White	Black	Asian	Other	Age Range	Source

(Beta) Paper Data Extraction

Uses Claude to extract socioeconomic status (SES) indicators and detailed race breakdowns from open-access clinical trial manuscripts via Unpaywall.

Token Usage & Scaling Projection

Pilot Run — avg input tokens/doc

Pilot Run — avg output tokens/doc

Pilot Size — documents processed

Remaining Docs — to be processed

Projected Scaling Cost — Sonnet 4.6 ($3/1M in, $15/1M out)

Total Token Estimate — input + output

Extracted Literature Data (Pilot Sample)

SES indicators (income, education, insurance) are critical for understanding health equity gaps in clinical trial populations.

Study Details	Income	Education	Insurance	SES Notes	Race Breakdown	Status	Source PDF

Frequently Asked Questions

What led you to do this?

Many people claim that trials are not diverse. There are also many on-going initiatives to increase diversity in clinical trials. There are not many publicly available tools to assess the progress of those initiatives, or get a holistic view of trial diversity. We thought this would be a great start.

Where do you get this data from?

We used the clinicaltrials.gov API, it is a great resource and should be more widely used. Programs like the Aggregate Analysis of ClinicalTrials.gov (AACT) Database from the clinical trials transformation initiative. The demographic variables we display here are more difficult to parse compared to some of the more standardized variables (e.g. trial phase, total participants, etc.). We designed this dashboard based on our experience parsing some of these sociodemographic characteristics (namely race and ethnicity (pre-print here), and on-going projects examining sex, gender, and geography.

Is this work currently funded?

No, but we are open to conversations about supporting this work. Shoot us an email [email protected]

How are AI studies identified?

A study is flagged as AI-related if any of its text fields contain one or more keywords associated with artificial intelligence or machine learning. The fields searched are: brief title, primary endpoint, conditions, primary condition category, and secondary condition category.

The keyword list includes:

artificial intelligence, machine learning, deep learning
neural network, large language model, LLM
natural language processing, computer vision
reinforcement learning, generative AI, chatbot
predictive algorithm, clinical decision support algorithm
algorithm-based, algorithm-driven, AI-based, AI-driven, AI-powered, ML-based, ML-driven

Matching is case-insensitive and uses whole-word boundary matching to avoid false positives (e.g., "algorithm-based" is matched, but "algorithm" appearing inside "logarithm" is not). This approach is intentionally broad to capture the evolving vocabulary around AI in clinical research, while still being precise enough to avoid spurious matches.

How is Funding Source derived?

Funding source is categorized based on sponsor information:

Industry: Lead Sponsor is Industry
NIH: Lead Sponsor is NIH, OR (Lead Sponsor is Other/Network AND any Collaborator is NIH)
Other U.S. Federal: Lead Sponsor is Federal, OR (Lead Sponsor is Other/Network AND any Collaborator is Federal)
Other: All other cases

How are conditions categorized?

Conditions are categorized using a standardized medical hierarchy. We group specific conditions (e.g., "Congenital Heart Disease") into broader Primary Categories (e.g., "Cardiovascular"), with more granular Secondary Categories underneath. This reduces redundancy from synonyms (e.g., "congenital heart defect" and "congenital heart disease" map to the same secondary category) and allows for both broad and granular filtering.

The classification uses a two-step process:

Exact/Substring Match: Each condition is checked against a curated list of keywords and synonyms, matched longest-first so specific terms (e.g., "heart failure") take priority over general ones (e.g., "heart").
Fuzzy Match: If no exact match is found, lightweight fuzzy string matching (via rapidfuzz) catches typos and minor variations (e.g., "Type II Diabetes" vs "Type 2 Diabetes").

Primary Category	Example Secondary Categories
Cardiovascular	Heart Failure, Coronary Artery Disease, Arrhythmia, Hypertension, Congenital Heart Disease, Valvular Heart Disease, Cardiomyopathy, Peripheral Vascular Disease
Oncology	Breast Cancer, Lung Cancer, Colorectal Cancer, Prostate Cancer, Hematologic Malignancy, Brain and CNS Tumors, Skin Cancer, Sarcoma
Neurology	Alzheimer's Disease and Dementia, Parkinson's Disease, Epilepsy and Seizure Disorders, Multiple Sclerosis, Stroke and Cerebrovascular, Headache and Migraine
Respiratory	COPD, Asthma, Pulmonary Fibrosis, Pneumonia, Pulmonary Hypertension, Sleep Apnea
Mental Health	Depression, Anxiety Disorders, Bipolar Disorder, Schizophrenia and Psychotic Disorders, PTSD and Trauma, ADHD, Autism Spectrum, Eating Disorders
Endocrine and Metabolic	Type 1 Diabetes, Type 2 Diabetes, Obesity, Thyroid Disorders, Lipid Disorders
Infectious Disease	HIV/AIDS, Hepatitis, COVID-19, Tuberculosis, Influenza, Bacterial Infections, Parasitic Diseases
Autoimmune and Inflammatory	Rheumatoid Arthritis, Systemic Lupus Erythematosus, Inflammatory Bowel Disease, Psoriasis and Psoriatic Arthritis, Vasculitis
Gastrointestinal	GERD and Esophageal, Liver Disease, Irritable Bowel Syndrome, Pancreatic Disorders
Kidney and Urological	Chronic Kidney Disease, End-Stage Renal Disease, Glomerular Diseases, Kidney Transplant, Urological Disorders
Musculoskeletal	Osteoarthritis, Osteoporosis, Back and Spine, Fibromyalgia, Gout, Fractures and Trauma
Dermatology	Eczema and Dermatitis, Psoriasis, Acne and Rosacea, Wound and Ulcer, Hair and Nail Disorders
Substance Use Disorders	Alcohol Use Disorder, Opioid Use Disorder, Tobacco and Nicotine
Hematology	Anemia, Coagulation Disorders, Thrombosis
Ophthalmology	Macular Degeneration, Glaucoma, Diabetic Eye Disease
Reproductive and Sexual Health	Infertility, Pregnancy Complications, Menopause and Hormonal
Transplant and Immunology	Solid Organ Transplant, Bone Marrow Transplant, Allergy
Rare Diseases	Cystic Fibrosis, Amyloidosis, Lysosomal Storage Disorders
Pain	Chronic Pain, Acute Pain, Cancer Pain
Other	Any condition not matching the above categories

Note: A study may have conditions spanning multiple categories. The dashboard filters show studies that match ANY of the keywords for the selected primary and/or secondary category. You can filter by primary category alone for broad analysis, or drill down to a specific secondary category for more targeted results.

How is the "Unknown/Not Reported" category calculated?

In the "Distribution Including Unknowns" charts, the Unknown category is calculated as:

Unknown = Total Enrollment - Sum(All Known Categories)

This ensures the chart always sums to exactly 100% of total enrollment, providing a complete picture of data completeness.

What about searching for specific trials and summarizing the information in other ways?

This project is currently focused on demographics surrounding clinical trials. There are other tools that do a great job at searching unstructured data from clinicaltrials.gov. There is a great connector for ClaudeCode built by the company deepsense.ai. More information on that connector is here.

How do you count trial sponsors?

We take a broad approach to capturing trial involvement. In our "Sponsor" filter, we count an organization if they are listed as either the Lead Sponsor or a Collaborator in the trial record. This allows us to capture the full ecosystem of organizations supporting a trial, rather than just the primary administrative entity.

Civic Sample is a tool created to investigate the demographic characteristics of participants in clinical trials. Centered on the ideal that diversity and representation in trials is important to prevent sample bias and improve study generalizability. This mission starts with understanding who is involved in studies, so we can move towards strategizing methods to improve representation.

Who built this

Maryam Aziz

Maryam Aziz is a Ph.D. candidate in Population Health Sciences at Duke University School of Medicine and holds an M.S. in Computer Science from Columbia University. Her research focuses on human-centered development and evaluation of AI in healthcare, particularly for women's health, with an emphasis on equity, transparency, and clinical impact.

Michael D. Green, Ph.D.

Michael is a Postdoctoral Researcher at the Department of Health, Behavior, and Society at the Johns Hopkins School of Public Health. Michael got his Ph.D. in Population Health Sciences at the Duke University School of Medicine, and a BA in Anthropology w/ honors from Dartmouth College. Michael's research focuses on unequal treatment in healthcare, specifically discrimination faced in healthcare settings.

Both hope to advance work to first establish a clear platform for accountability and transparency around the state of diversity in clinical trials, and second assist trial sponsors, investigators, and companies with approaches to diversify their trial population to strive for a representative trial.

Total Studies

Report Race

Report Ethnicity

Report Both

Reporting Trends Over Time

Total Participants with Reported Race Data

Full Distribution with Data Quality

Race Distribution (NIH/OMB Categories)

Race Over Time

Total Participants with Reported Ethnicity Data

Full Distribution with Data Quality

Ethnicity Distribution

Ethnicity Over Time

Total Participants with Reported Sex Data

Full Distribution with Data Quality

Sex Distribution

Sex Ratio Over Time

Total Participants with Reported Gender Data

Full Distribution with Data Quality

Gender Distribution

Gender Over Time

Study Details

Trials by US State

Regional Distribution

Cities in Selected State

Trials by Country

Site Distribution

Geographic Reporting Over Time

FDA-Authorized AI/ML-Enabled Medical Devices

Devices by Medical Panel

Authorizations Over Time

(Beta) AI Demographic Extraction

Token Usage & Scaling Projection

Demographic Reporting Frequency

Extracted FDA Demographics (Pilot Sample)

(Beta) Paper Data Extraction

Token Usage & Scaling Projection

Extracted Literature Data (Pilot Sample)

Frequently Asked Questions

What led you to do this?

Where do you get this data from?

Is this work currently funded?

How are AI studies identified?

How is Funding Source derived?

How are conditions categorized?

How is the "Unknown/Not Reported" category calculated?

What about searching for specific trials and summarizing the information in other ways?

How do you count trial sponsors?

Who built this

Maryam Aziz

Michael D. Green, Ph.D.