14 min read

FDA AI/ML Compliance Guide for Medical Device Manufacturers

Navigate FDA regulatory pathways, marketing submission requirements, and compliance strategies for AI/ML medical devices.

By Hashi S.

Medical device manufacturers incorporating artificial intelligence and machine learning face a rapidly evolving regulatory landscape. Beyond FDA-specific requirements, manufacturers must also navigate broader federal AI compliance requirements that apply across government agencies and regulated industries. The FDA has published multiple guidance documents between 2019 and 2025.

These documents establish frameworks for AI/ML-enabled Software as a Medical Device. They continue to refine expectations for marketing submissions, lifecycle management, and post-market surveillance.

Understanding these requirements determines whether your innovation reaches patients quickly or languishes in regulatory review. The challenge is not simply meeting current FDA requirements.

It's building AI/ML development processes that remain compliant as algorithms evolve, data distributions shift, and clinical practices change. For manufacturers serving international markets, understanding the EU AI Act requirements for medical devices is equally critical. With Predetermined Change Control Plans, Good Machine Learning Practice principles, and new lifecycle management guidance, manufacturers need clear strategies.

Key Takeaways

  • Three regulatory pathways: 510(k) for substantial equivalence, De Novo for novel low-risk devices, PMA for high-risk innovations
  • Good Machine Learning Practice (GMLP) provides 10 guiding principles for AI/ML development across entire lifecycle
  • Predetermined Change Control Plans (PCCPs) enable continuous learning without new submissions for planned modifications
  • Marketing submissions require comprehensive documentation of algorithms, training data, validation studies, and clinical performance
  • Post-market surveillance demands ongoing performance monitoring, bias detection, and real-world evidence collection

What Are the FDA Regulatory Pathways for AI/ML Medical Devices?

The FDA reviews AI/ML-enabled medical devices through the same premarket pathways used for traditional devices. These are 510(k) Premarket Notification, De Novo Classification, and Premarket Approval.

The pathway you use depends on your device's risk classification and whether an appropriate predicate device exists. Understanding pathway selection is critical because it determines the evidence requirements, review timeline, and cost of market entry.

**The 510(k) pathway** represents the most common route for AI/ML medical devices. This pathway requires demonstrating that your device is substantially equivalent to a legally marketed predicate device.

For AI/ML devices, substantial equivalence means your device performs the same clinical function using comparable AI/ML approaches. The challenge lies in finding appropriate predicates, as the AI/ML medical device landscape continues to evolve rapidly.

The FDA maintains an AI-Enabled Medical Device List that catalogs cleared and approved AI/ML devices. This provides a starting point for predicate research.

**The De Novo pathway** serves devices that are novel—no predicate exists—but are low-to-moderate risk. This pathway allows you to establish a new device classification.

It can position your device as a predicate for future 510(k) submissions by other manufacturers. De Novo submissions require more comprehensive evidence than 510(k)s but less than PMAs.

Many innovative AI/ML applications use De Novo classification to establish new device categories. This is particularly true for novel clinical applications or unique AI/ML approaches.

**Premarket Approval** applies to high-risk Class III devices where general controls and special controls are insufficient to assure safety and effectiveness. PMA requires the most extensive clinical evidence, including well-controlled clinical investigations.

While less common for AI/ML Software as a Medical Device, PMA may be necessary for AI/ML systems that make autonomous high-risk clinical decisions or are incorporated into Class III hardware devices.

What Is Good Machine Learning Practice and Why Does It Matter?

Good Machine Learning Practice represents a set of guiding principles developed by international regulators. These principles promote safe, effective, and high-quality AI/ML medical devices.

In January 2025, the International Medical Device Regulators Forum published ten GMLP principles. These build on earlier work by the FDA, Health Canada, and the UK's Medicines and Healthcare products Regulatory Agency.

These principles inform FDA expectations for AI/ML development processes and marketing submissions.

**The first principle emphasizes multi-disciplinary expertise.** AI/ML medical device development requires collaboration between data scientists, clinical experts, regulatory specialists, and quality professionals.

Manufacturers that treat AI/ML development as purely a software engineering problem miss critical clinical and regulatory considerations. Effective teams include clinicians who understand the intended use context.

They include data scientists with expertise in appropriate AI/ML techniques, regulatory professionals who understand FDA expectations, and quality specialists who ensure processes meet medical device standards.

**Data quality forms the foundation of GMLP.** Training, validation, and test datasets must be relevant to the intended use, representative of the target population, appropriately labeled or annotated, and of sufficient quality and quantity.

Poor data quality represents the most common source of AI/ML device failures. Manufacturers must document data sources, collection methods, demographic composition, labeling processes, and quality control measures.

The FDA expects manufacturers to demonstrate that training data reflects the diversity of patients who will use the device. This includes relevant demographic factors and clinical variations.

**Model development and validation require rigorous processes.** Appropriate model architecture must match the clinical task and available data. Training processes must include proper validation techniques to prevent overfitting.

Comprehensive testing must evaluate performance across relevant populations, use cases, and edge cases. The FDA expects manufacturers to document model design decisions, training approaches, hyperparameter selection, and validation results.

Performance metrics should include not just overall accuracy but also sensitivity, specificity, positive predictive value, and negative predictive value. Subgroup analysis for relevant demographic categories is essential.

**Transparency and interpretability requirements** depend on the device's role in clinical decision-making. Devices that provide recommendations requiring clinician interpretation may need less explainability than devices making autonomous decisions.

However, all AI/ML medical devices require sufficient transparency for clinicians to understand when to trust device outputs and when to apply clinical judgment. Manufacturers should document the level of interpretability provided, the techniques used to achieve it, and how users will understand device limitations.

Implementing Good Machine Learning Practice requires expertise in both AI/ML development and FDA regulatory requirements. DigiForm helps medical device manufacturers build GMLP-compliant development processes from initial data collection through validation and documentation.

How Do Predetermined Change Control Plans Work?

Predetermined Change Control Plans represent one of the FDA's most significant innovations for AI/ML medical devices. PCCPs allow manufacturers to make planned modifications without submitting new 510(k)s or PMAs.

This enables continuous learning and improvement while maintaining regulatory oversight. Understanding when and how to use PCCPs can dramatically reduce regulatory burden for adaptive AI/ML systems.

**A PCCP describes specific modifications** you plan to make to your device, how those modifications will be implemented, how you'll assess their impact, and what verification and validation you'll perform before deployment.

The PCCP is submitted with your initial marketing application—510(k), De Novo, or PMA—and reviewed as part of that submission. Once FDA clears or approves your device including the PCCP, you can implement the planned modifications according to the approved plan.

**PCCPs work best for modifications that improve device performance** within the approved intended use. Examples include algorithm updates that improve accuracy, training data refreshes that maintain performance as clinical practices evolve, and performance improvements for underrepresented populations.

They also work well for adaptations to new input data formats or sources. PCCPs are less appropriate for changes that alter intended use, add new indications, change fundamental algorithm approaches, or significantly modify the clinical decision-making role.

**A well-designed PCCP includes four key components.** The modification protocol describes what changes will be made, including specific parameters that may be adjusted, data sources that may be updated, and performance improvements targeted.

The impact assessment explains how modifications affect safety and effectiveness, including worst-case scenarios and mitigation strategies. The verification and validation protocol specifies testing that will be performed before deploying modifications.

The implementation plan describes how and when modifications will be deployed, including user notification and training.

**Manufacturers implementing modifications under an approved PCCP** must maintain detailed records and report annually to FDA. Annual reports should summarize modifications made, verification and validation results, any issues encountered, and current device performance.

This reporting allows FDA to monitor whether PCCPs are working as intended and whether additional oversight is needed. Manufacturers should treat PCCP implementation as a quality system activity, with appropriate documentation, review, and approval processes.

What Documentation Does FDA Expect in Marketing Submissions?

FDA marketing submissions for AI/ML medical devices require comprehensive documentation. This addresses device description, algorithm design, data characteristics, performance testing, risk management, and labeling.

The January 2025 draft guidance on AI-Enabled Device Software Functions provides detailed recommendations for submission content. It builds on earlier guidance documents and GMLP principles.

**Device description should clearly articulate** intended use, target users, use environment, and the AI/ML system's role in clinical decision-making. The FDA distinguishes between devices that provide information to support clinical decisions and devices that make autonomous decisions.

This distinction affects evidence requirements and risk classification. Manufacturers should describe the clinical workflow, how clinicians will interact with the device, what information the device provides, and what decisions clinicians make based on device outputs.

**Algorithm description must explain the AI/ML approach** at an appropriate level of detail. This includes model type and architecture, input features and their clinical significance, output format and interpretation, training approach and hyperparameters, and performance metrics.

For complex models like deep neural networks, manufacturers should provide architecture diagrams, layer specifications, and activation functions. For ensemble methods, describe individual models and combination approaches.

The goal is enabling FDA reviewers to understand how the algorithm works and assess whether the approach is appropriate for the intended use.

**Data description represents a critical component** of AI/ML submissions. Manufacturers must document training, validation, and test datasets separately.

This includes data sources and collection methods, demographic composition, sample sizes, labeling or annotation processes, data quality measures, and any data augmentation or preprocessing.

The FDA expects manufacturers to demonstrate that datasets are representative of the intended use population. This requires demographic analysis showing age, sex, race, ethnicity, and clinically relevant factors.

For datasets with demographic imbalances, manufacturers should explain the imbalance and describe mitigation strategies.

**Performance testing must demonstrate** that the AI/ML device performs safely and effectively across the intended use population. This includes standalone performance testing showing the algorithm performs as designed.

It includes clinical validation demonstrating performance in the intended use environment and subgroup analysis evaluating performance across demographic categories.

Manufacturers should present performance metrics with confidence intervals, confusion matrices showing true positives, false positives, true negatives, and false negatives, and receiver operating characteristic curves for classification tasks. Subgroup analysis should reveal any performance disparities that might affect clinical use.

Preparing FDA-compliant marketing submissions for AI/ML devices requires deep understanding of regulatory expectations and technical AI/ML documentation. DigiForm assists medical device manufacturers in developing submission-ready documentation that satisfies FDA requirements while clearly communicating device capabilities and limitations.

How Do I Demonstrate My Algorithm Doesn't Have Bias?

Algorithmic bias represents one of the FDA's primary concerns for AI/ML medical devices. Bias occurs when an algorithm performs differently across demographic groups, potentially leading to disparate health outcomes.

Demonstrating fairness requires systematic evaluation, transparent reporting, and proactive mitigation strategies. The FDA expects manufacturers to address bias throughout the development lifecycle, not as an afterthought before submission.

**Bias assessment begins with data analysis.** Examine the demographic composition of training, validation, and test datasets. Document representation across age, sex, race, ethnicity, and clinically relevant factors like disease severity, comorbidities, or treatment history.

Identify underrepresented groups and assess whether underrepresentation might affect algorithm performance. For datasets with demographic imbalances, consider whether the imbalance reflects true clinical prevalence or data collection limitations.

**Performance evaluation must include subgroup analysis.** Calculate performance metrics separately for each demographic group. Compare sensitivity, specificity, positive predictive value, and negative predictive value across groups.

Statistical testing should assess whether performance differences are significant or could occur by chance. Clinically meaningful disparities—even if not statistically significant—warrant investigation and potential mitigation.

The FDA expects manufacturers to establish acceptable performance thresholds for each subgroup, not just overall performance.

**Bias mitigation strategies depend on the source and magnitude of bias.** For bias stemming from underrepresented training data, consider collecting additional data, applying balanced sampling techniques, or using synthetic data generation.

For bias in model predictions, explore fairness constraints during training, post-processing calibration, or threshold adjustments for different groups. For bias in feature selection, evaluate whether input features encode demographic information inappropriately.

Consider whether alternative features could reduce bias while maintaining performance.

**Documentation should transparently report** bias assessment results and mitigation efforts. Describe the demographic composition of datasets, present subgroup performance analysis, explain any performance disparities identified, and document mitigation strategies implemented.

If residual performance differences remain after mitigation, explain why they persist and whether they pose clinical concerns. The FDA appreciates transparent reporting of limitations more than claims of perfect fairness that cannot be substantiated.

What Post-Market Surveillance Is Required for AI/ML Devices?

Post-market surveillance for AI/ML medical devices extends beyond traditional device monitoring. It addresses AI/ML-specific concerns like model drift, performance degradation, and real-world fairness.

The FDA expects manufacturers to implement ongoing performance monitoring, detect and respond to issues promptly, and report significant problems through established channels.

**Performance monitoring should track key metrics** in real-world use. Establish baseline performance from premarket validation, then monitor whether real-world performance remains within acceptable bounds.

Track input data distribution to detect shifts that might indicate drift. Monitor output distributions to identify unusual patterns. Collect user feedback about device performance, usability issues, or unexpected behaviors.

The monitoring approach should be proportional to device risk and the likelihood of performance changes.

**Model drift detection requires comparing** real-world performance to validation performance. Drift can occur when the patient population changes, when clinical practices evolve, when input data characteristics shift, or when the relationship between inputs and outputs changes.

Manufacturers should establish drift detection thresholds that trigger investigation before performance degradation affects patient care. Statistical process control techniques can help identify when performance trends exceed normal variation.

**Incident reporting follows standard medical device** adverse event reporting requirements. Manufacturers must report deaths, serious injuries, and malfunctions that could cause death or serious injury.

For AI/ML devices, reportable events might include algorithm failures that lead to incorrect diagnoses, performance degradation that affects clinical decisions, cybersecurity incidents that compromise device integrity, or bias-related issues that cause disparate outcomes.

The FDA expects timely reporting and thorough investigation of incidents.

**Corrective actions for AI/ML devices** might include software updates to address performance issues, algorithm retraining to restore performance, labeling updates to clarify limitations, or user training to address misuse patterns.

In severe cases, device recalls may be necessary. If you have an approved PCCP covering the issue, you can implement planned modifications. Without a PCCP, significant changes may require new marketing submissions.

The key is detecting and addressing issues before they cause widespread harm.

Frequently Asked Questions

Which FDA pathway should I use for my AI/ML medical device?

The appropriate pathway depends on your device's risk classification and whether a predicate exists. Most AI/ML medical devices use the 510(k) pathway if you can demonstrate substantial equivalence to a legally marketed predicate device.

If no appropriate predicate exists but your device is low-to-moderate risk, the De Novo pathway establishes a new device category. It can serve as a predicate for future devices.

High-risk Class III devices require Premarket Approval (PMA), though this is less common for AI/ML Software as a Medical Device. The key decision factors are device risk level, availability of predicates, and the amount of clinical evidence you can generate.

What is a Predetermined Change Control Plan and do I need one?

A Predetermined Change Control Plan (PCCP) is documentation submitted with your initial marketing application. It describes planned modifications to your AI/ML device and how you'll assess those modifications.

PCCPs allow you to make approved changes—such as algorithm updates, training data refreshes, or performance improvements—without submitting new 510(k)s or PMAs.

You should consider a PCCP if your AI/ML device will benefit from continuous learning, if you plan regular algorithm improvements based on real-world data, or if your device operates in a rapidly evolving clinical environment. PCCPs are optional but provide significant regulatory efficiency for adaptive AI/ML systems.

How do I demonstrate that my AI/ML algorithm doesn't have bias?

Demonstrating fairness requires systematic evaluation across relevant demographic subgroups. First, document the demographic composition of your training, validation, and test datasets.

Include age, sex, race, ethnicity, and clinically relevant factors. Second, conduct subgroup analysis showing performance metrics (sensitivity, specificity, accuracy) for each demographic group.

Third, identify and address performance disparities through techniques like balanced sampling, fairness constraints, or post-processing calibration. Fourth, document your bias assessment methodology and mitigation strategies in your marketing submission.

The FDA expects manufacturers to proactively identify and address potential biases rather than waiting for post-market problems to emerge.

What clinical evidence does FDA expect for AI/ML medical devices?

Clinical evidence requirements depend on your regulatory pathway and device risk. For 510(k) submissions, you typically need standalone performance testing showing your device performs comparably to the predicate.

You also need clinical validation demonstrating performance in the intended use environment. This might include retrospective studies using clinical data, prospective observational studies, or in some cases randomized controlled trials.

For De Novo and PMA pathways, FDA expects more robust clinical evidence including prospective studies and real-world validation. The key is demonstrating that your AI/ML device performs safely and effectively across the intended use population, including relevant demographic subgroups.

How often do I need to retrain my AI/ML model?

Retraining frequency depends on your device's performance monitoring results and your PCCP if you have one. You should establish performance monitoring that tracks key metrics in real-world use.

Watch for model drift, performance degradation, or changes in the input data distribution. Retrain when monitoring indicates performance has declined below acceptable thresholds.

Also retrain when you've accumulated sufficient new data to improve performance, when the clinical environment has changed significantly, or according to the schedule in your approved PCCP.

If you don't have a PCCP, significant retraining that changes device performance may require a new marketing submission.

Can I use real-world data for validation instead of clinical trials?

Yes, FDA increasingly accepts real-world data (RWD) and real-world evidence (RWE) for medical device validation, particularly for AI/ML devices.

Real-world data from electronic health records, claims databases, patient registries, or post-market surveillance can support both initial marketing submissions and post-market performance monitoring.

However, you must demonstrate that your real-world data is fit for purpose. It must be representative of the intended use population, of sufficient quality and completeness, properly labeled or annotated, and analyzed using appropriate methods.

The FDA evaluates real-world evidence on a case-by-case basis, considering the specific device, intended use, and available data sources.

What happens if my algorithm drifts in real-world use?

Algorithm drift—when model performance degrades due to changes in input data distribution—requires proactive monitoring and response.

First, your post-market surveillance should detect drift through continuous performance monitoring. Second, assess whether drift affects safety or effectiveness. Minor drift within acceptable performance bounds may not require action.

Significant drift requires investigation and remediation. This might include retraining with recent data, adjusting decision thresholds, or updating the algorithm.

If you have an approved PCCP covering drift scenarios, you can implement planned modifications. Without a PCCP, significant changes may require a new marketing submission. In severe cases affecting patient safety, you may need to issue field corrections or recalls.

How do I find an appropriate predicate device for 510(k)?

Finding AI/ML predicates requires systematic research. Start with FDA's AI-Enabled Medical Device List, which catalogs cleared and approved AI/ML devices.

Search the 510(k) database for devices with similar intended use, technological characteristics, and clinical applications. Look for predicates that use similar AI/ML approaches (machine learning, deep learning, specific algorithms) for comparable clinical tasks.

The predicate must be legally marketed and have the same intended use. For AI/ML devices, technological characteristics include input data types, algorithm approach, output format, and clinical decision-making role.

If you cannot find an appropriate predicate, consider the De Novo pathway, which can establish your device as a predicate for future innovations.

HS

About the Author

Hashi S.

AI Governance & Digital Transformation Consultant at DigiForm. Expert in federal AI compliance, enterprise AI strategy, and regulated industries. Led 60+ AI projects with zero compliance incidents across government agencies and Fortune 500 companies.

Connect on LinkedIn →

Ready to navigate FDA AI/ML compliance? Contact DigiForm to learn how we help medical device manufacturers build compliant AI/ML development processes and prepare FDA-ready marketing submissions.