Total Studies
-
Report Race
-
Report Ethnicity
-
Report Both
-
Reporting Trends Over Time
Total Participants with Reported Race Data
Shows the total number of participants with explicitly reported race data per year (excludes "Unknown" and studies without race data).
Full Distribution with Data Quality
Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.
Race Distribution (NIH/OMB Categories)
Race Over Time
Proportion among studies that reported race (excludes missing and unknown).
Total Participants with Reported Ethnicity Data
Shows the total number of participants with explicitly reported ethnicity data per year (excludes "Unknown" and studies without ethnicity data).
Full Distribution with Data Quality
Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.
Ethnicity Distribution
Note: The "Unknown or Not Reported" category is large because many studies do not collect ethnicity data or participants decline to report. This reflects limitations in data collection practices across clinical trials.
Ethnicity Over Time
Proportion among studies that reported ethnicity (excludes missing and unknown).
Disclaimer: This is currently an unstructured approach to wrangling this data, presented for demonstration purposes. We are refining this approach and expect to have a preprint detailing our methodology available by June 2026.
Total Participants with Reported Sex Data
Shows the total number of participants with explicitly reported sex data per year (excludes "Unknown" and studies without sex data).
Full Distribution with Data Quality
Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.
Sex Distribution
Sex Ratio Over Time
Proportion among studies that reported sex (excludes missing and unknown).
Disclaimer: This is currently an unstructured approach to wrangling this data, presented for demonstration purposes. We are refining this approach and expect to have a preprint detailing our methodology available by June 2026.
Total Participants with Reported Gender Data
Shows the total number of participants with explicitly reported gender identity data per year (excludes "Unknown" and studies without gender data).
Full Distribution with Data Quality
Proportion of total enrollment by category, distinguishing between explicitly unknown data and implicit missing (not reported) data.
Gender Distribution
Note: Gender identity is rarely reported separately from biological sex in clinical trials.
Gender Over Time
Proportion of reported gender identities across years (excludes missing and unknown).
Study Details
Click on any NCT ID to view the full trial on ClinicalTrials.gov. Click column headers to sort. Click checkmarks (✓) to view detailed demographic breakdowns.
| Study Start Date | Study End Date | NCT ID | Details | Title | Time to Report | Race | Ethnicity | Sex | Results Posted | Last Update | Study Status | Participants | Publications | Lead Sponsor | Phase | Type | Design | Primary Endpoint |
|---|
Trials by US State
Click a state to see city-level breakdown. Darker colors indicate higher values.
Regional Distribution
Trials by US Census Region
Site Distribution
Breakdown of trials by number of sites.
Geographic Reporting Over Time
Percentage of trials reporting location data by year.
FDA-Authorized AI/ML-Enabled Medical Devices
Data source: FDA Artificial Intelligence-Enabled Medical Devices
Last Updated: March 5, 2026
This list identifies AI/ML-enabled medical devices authorized for marketing in the United States. Demographic data for these devices is not reported by the FDA; the dashboard displays 100% "Not Reported" for all demographic categories.
Devices by Medical Panel
Authorizations Over Time
| Date | Submission # | Device | Company | Panel | Product Code | Application |
|---|
(Beta) AI Demographic Extraction
Uses Claude to extract demographic data from FDA 510(k)/De Novo/PMA summary PDFs. This pipeline identifies how often race, sex, and age are reported in AI/ML device clinical validations.
Demographic Reporting Frequency
Extracted FDA Demographics (Pilot Sample)
Fields marked Not Reported highlight the gap in FDA demographic disclosure.
| Device Name | Panel | Submission # | Total Participants | Male | Female | White | Black | Asian | Other | Age Range | Source |
|---|
(Beta) Paper Data Extraction
Uses Claude to extract socioeconomic status (SES) indicators and detailed race breakdowns from open-access clinical trial manuscripts via Unpaywall.
Extracted Literature Data (Pilot Sample)
SES indicators (income, education, insurance) are critical for understanding health equity gaps in clinical trial populations.
| Study Details | Income | Education | Insurance | SES Notes | Race Breakdown | Status | Source PDF |
|---|
Frequently Asked Questions
What led you to do this?
Many people claim that trials are not diverse. There are also many on-going initiatives to increase diversity in clinical trials. There are not many publicly available tools to assess the progress of those initiatives, or get a holistic view of trial diversity. We thought this would be a great start.
Where do you get this data from?
We used the clinicaltrials.gov API, it is a great resource and should be more widely used. Programs like the Aggregate Analysis of ClinicalTrials.gov (AACT) Database from the clinical trials transformation initiative. The demographic variables we display here are more difficult to parse compared to some of the more standardized variables (e.g. trial phase, total participants, etc.). We designed this dashboard based on our experience parsing some of these sociodemographic characteristics (namely race and ethnicity (pre-print here), and on-going projects examining sex, gender, and geography.
Is this work currently funded?
No, but we are open to conversations about supporting this work. Shoot us an email [email protected]
How are AI studies identified?
A study is flagged as AI-related if any of its text fields contain one or more keywords associated with artificial intelligence or machine learning. The fields searched are: brief title, primary endpoint, conditions, primary condition category, and secondary condition category.
The keyword list includes:
- artificial intelligence, machine learning, deep learning
- neural network, large language model, LLM
- natural language processing, computer vision
- reinforcement learning, generative AI, chatbot
- predictive algorithm, clinical decision support algorithm
- algorithm-based, algorithm-driven, AI-based, AI-driven, AI-powered, ML-based, ML-driven
Matching is case-insensitive and uses whole-word boundary matching to avoid false positives (e.g., "algorithm-based" is matched, but "algorithm" appearing inside "logarithm" is not). This approach is intentionally broad to capture the evolving vocabulary around AI in clinical research, while still being precise enough to avoid spurious matches.
How is Funding Source derived?
Funding source is categorized based on sponsor information:
- Industry: Lead Sponsor is Industry
- NIH: Lead Sponsor is NIH, OR (Lead Sponsor is Other/Network AND any Collaborator is NIH)
- Other U.S. Federal: Lead Sponsor is Federal, OR (Lead Sponsor is Other/Network AND any Collaborator is Federal)
- Other: All other cases
How are conditions categorized?
Conditions are categorized using a standardized medical hierarchy. We group specific conditions (e.g., "Congenital Heart Disease") into broader Primary Categories (e.g., "Cardiovascular"), with more granular Secondary Categories underneath. This reduces redundancy from synonyms (e.g., "congenital heart defect" and "congenital heart disease" map to the same secondary category) and allows for both broad and granular filtering.
The classification uses a two-step process:
- Exact/Substring Match: Each condition is checked against a curated list of keywords and synonyms, matched longest-first so specific terms (e.g., "heart failure") take priority over general ones (e.g., "heart").
- Fuzzy Match: If no exact match is found, lightweight fuzzy string matching (via rapidfuzz) catches typos and minor variations (e.g., "Type II Diabetes" vs "Type 2 Diabetes").
| Primary Category | Example Secondary Categories |
|---|---|
| Cardiovascular | Heart Failure, Coronary Artery Disease, Arrhythmia, Hypertension, Congenital Heart Disease, Valvular Heart Disease, Cardiomyopathy, Peripheral Vascular Disease |
| Oncology | Breast Cancer, Lung Cancer, Colorectal Cancer, Prostate Cancer, Hematologic Malignancy, Brain and CNS Tumors, Skin Cancer, Sarcoma |
| Neurology | Alzheimer's Disease and Dementia, Parkinson's Disease, Epilepsy and Seizure Disorders, Multiple Sclerosis, Stroke and Cerebrovascular, Headache and Migraine |
| Respiratory | COPD, Asthma, Pulmonary Fibrosis, Pneumonia, Pulmonary Hypertension, Sleep Apnea |
| Mental Health | Depression, Anxiety Disorders, Bipolar Disorder, Schizophrenia and Psychotic Disorders, PTSD and Trauma, ADHD, Autism Spectrum, Eating Disorders |
| Endocrine and Metabolic | Type 1 Diabetes, Type 2 Diabetes, Obesity, Thyroid Disorders, Lipid Disorders |
| Infectious Disease | HIV/AIDS, Hepatitis, COVID-19, Tuberculosis, Influenza, Bacterial Infections, Parasitic Diseases |
| Autoimmune and Inflammatory | Rheumatoid Arthritis, Systemic Lupus Erythematosus, Inflammatory Bowel Disease, Psoriasis and Psoriatic Arthritis, Vasculitis |
| Gastrointestinal | GERD and Esophageal, Liver Disease, Irritable Bowel Syndrome, Pancreatic Disorders |
| Kidney and Urological | Chronic Kidney Disease, End-Stage Renal Disease, Glomerular Diseases, Kidney Transplant, Urological Disorders |
| Musculoskeletal | Osteoarthritis, Osteoporosis, Back and Spine, Fibromyalgia, Gout, Fractures and Trauma |
| Dermatology | Eczema and Dermatitis, Psoriasis, Acne and Rosacea, Wound and Ulcer, Hair and Nail Disorders |
| Substance Use Disorders | Alcohol Use Disorder, Opioid Use Disorder, Tobacco and Nicotine |
| Hematology | Anemia, Coagulation Disorders, Thrombosis |
| Ophthalmology | Macular Degeneration, Glaucoma, Diabetic Eye Disease |
| Reproductive and Sexual Health | Infertility, Pregnancy Complications, Menopause and Hormonal |
| Transplant and Immunology | Solid Organ Transplant, Bone Marrow Transplant, Allergy |
| Rare Diseases | Cystic Fibrosis, Amyloidosis, Lysosomal Storage Disorders |
| Pain | Chronic Pain, Acute Pain, Cancer Pain |
| Other | Any condition not matching the above categories |
Note: A study may have conditions spanning multiple categories. The dashboard filters show studies that match ANY of the keywords for the selected primary and/or secondary category. You can filter by primary category alone for broad analysis, or drill down to a specific secondary category for more targeted results.
How is the "Unknown/Not Reported" category calculated?
In the "Distribution Including Unknowns" charts, the Unknown category is calculated as:
Unknown = Total Enrollment - Sum(All Known Categories)
This ensures the chart always sums to exactly 100% of total enrollment, providing a complete picture of data completeness.
What about searching for specific trials and summarizing the information in other ways?
This project is currently focused on demographics surrounding clinical trials. There are other tools that do a great job at searching unstructured data from clinicaltrials.gov. There is a great connector for ClaudeCode built by the company deepsense.ai. More information on that connector is here.
How do you count trial sponsors?
We take a broad approach to capturing trial involvement. In our "Sponsor" filter, we count an organization if they are listed as either the Lead Sponsor or a Collaborator in the trial record. This allows us to capture the full ecosystem of organizations supporting a trial, rather than just the primary administrative entity.
Civic Sample is a tool created to investigate the demographic characteristics of participants in clinical trials. Centered on the ideal that diversity and representation in trials is important to prevent sample bias and improve study generalizability. This mission starts with understanding who is involved in studies, so we can move towards strategizing methods to improve representation.
Who built this
Maryam Aziz
Maryam Aziz is a Ph.D. candidate in Population Health Sciences at Duke University School of Medicine and holds an M.S. in Computer Science from Columbia University. Her research focuses on human-centered development and evaluation of AI in healthcare, particularly for women's health, with an emphasis on equity, transparency, and clinical impact.
Michael D. Green, Ph.D.
Michael is a Postdoctoral Researcher at the Department of Health, Behavior, and Society at the Johns Hopkins School of Public Health. Michael got his Ph.D. in Population Health Sciences at the Duke University School of Medicine, and a BA in Anthropology w/ honors from Dartmouth College. Michael's research focuses on unequal treatment in healthcare, specifically discrimination faced in healthcare settings.
Both hope to advance work to first establish a clear platform for accountability and transparency around the state of diversity in clinical trials, and second assist trial sponsors, investigators, and companies with approaches to diversify their trial population to strive for a representative trial.