AI-enabled SaMD and Digital Therapeutic Solutions: Good Practices for Commercialization
Artificial intelligence (AI) has opened up new possibilities in the life science and healthcare industries for developing solutions to help physicians make decisions and deliver customized treatment to patients. On the one hand, analyzing large amounts of data quickly can improve operational efficiency, and on the other, creating an integrated view of available diverse data can allow for more accurate diagnosis.
The real-world examples include algorithms that use image analysis to enhance skin cancer diagnosis and smart sensor devices that estimate the likelihood of a heart attack. However, along with the benefits, technology has introduced new risks as well. The AI models derive algorithm parameters by learning from training data. The accuracy and generality of the algorithm are governed by training data set factors, such as patients’ age, gender, ethnicity, etc. Any bias in the data will be propagated to the algorithm, impacting the outcome under real-world clinical situations. Therefore, good practices must be implemented and adopted when developing such medical solutions in order to ensure patient safety.
As AI evolves in the field, it improves performance in target clinical settings and opens up new possibilities for advanced solutions. These ‘adaptive’ AI solutions pose further risks as compared to ‘locked’ algorithms which do not change their behavior once they are deployed. Controlling real-world training data for adaptive solutions is extremely difficult, thereby increasing the risks of potential bias and overfitting. As a result, protecting field-implemented models from evasion attacks becomes more critical. As a result, active research is underway to leverage AI's capabilities to learn from real-world feedback and improve accuracy while simultaneously managing risks.
AI/ML-enabled medical devices require additional controls throughout the product life cycle to ensure patient safety and solution reliability. Although there are no established standards for certifying such solutions, regulatory organizations have issued guidance to assure the effectiveness and repeatability of results in the clinical environment in accordance with the intended use of the solution. The FDA has approved over 350 AI/ML-based medical devices as of 2021, and all of these solutions deploy locked algorithms.
Current standards and guidelines
Software as a Medical Device (SaMD) and Digital Therapeutics (DTx) solutions are software-only solutions that can assist in the diagnosis, assessment, and treatment of medical disorders. Unlike hardware-based medical devices, SaMDs can address malfunctions quickly and efficiently and minimize adverse events. Therefore, the software pre-certification program offers a pathway to ensure that health care professionals and patients can benefit from software upgrades as quickly as possible while still maintaining a high level of safety and security during the process. The program is based on the culture of quality and organizational excellence (CQOE). It offers total product life cycle (TPLC) management where the evaluation is spread out over the entire life-cycle of the product, thereby reducing the evaluation time for small modifications required to address malfunctions when the solution is already in production.
The TPLC model effectively addresses the quality evaluation and auditing requirements of the software development lifecycle (SDLC). However, due to the unique capabilities of AI solutions, the TPLC approach falls short in limiting the risks brought by dataset training, data preprocessing, and the influence of real-world datasets on model output. Studies have shown that algorithms may perform differently on patients of different ethnicities if the training data set is not broad enough.
For example, an algorithm designed for skin lesion classification and trained on a data set of light-colored skin patients has been found to underperform for patients with dark skin. Real-world training data sets raise the risk of adversarial attacks, which may introduce bias. A group of Twitter users, for example, encouraged a chatbot to make racially biased comments by feeding it specific comments.
The US FDA discussion paper proposes a regulatory framework that extends the TPLC approach by overlaying it on the AI/ML workflow represented in Figure 2. The new SaMD pre-specifications (SPS) and Algorithm Change Protocol (ACP) concepts manage algorithm modifications caused by continuous learning and define mechanisms to mitigate the associated risks.
Furthermore, the FDA, Health Canada, and the IMHRA have identified ten guiding principles (refer to Table 1) that will serve as a foundation for developing good practices for machine learning-based devices (GMLP). GMLP recommends best practices for various stages of TPLC and proposes tailored procedures for the application of AI and ML technologies in the health care and life sciences domains. Adoption of these recommended practices shall help manufacturers in demonstrating the safety, effectiveness, and reliability of AI-based SaMD and DTx solutions.
These guidelines provide checkpoints for auditors and manufacturers to ensure the safety and reliability of medical devices. The key points of interest for auditors for AI/ML-enabled solutions will revolve around reference data sets, benefit-risk analysis of using AI models, data privacy, and clinical performance. The next section shows how the GMLP guiding principles relate to good practices in the software development life cycle, processes, and standard operating procedures (SOPs) for developing AI-enabled medical devices.
Good practices for the development of AI/ML-based medical solutions
Intended Use: The intended use of the medical device should justify the benefits of the application of AI against the associated risks. Such transparency presents a convincing approach as decisions made by AI algorithms are difficult to explain logically. The identification of specific tasks for AI applications, as well as the integration plan with clinical workflow, must demonstrate an in-depth understanding of the objectives. If the medical device is intended for a global population, the applicability must be further characterized based on training data sets, such as the ethnicity of the patient.
Dataset Integrity: Data collection methodologies are audited to ensure that the best methods for collecting or generating data are used. In addition to preserving the anonymity of data, its origins should be traceable to provide evidence of its legitimacy and integrity. A traceability template for systematic documentation of data statistics, procedures, consents, timeframe, location, and population will help auditors validate the data sets, which are important parts of figuring out whether AI algorithms are useful, accurate, and reliable. Additionally, the datasets need to be under version control at all times.
Justification of Datasets: The plan for training and validating data sets must document the sufficient number of data samples, patient inclusion and exclusion criteria, and justification for why the algorithm outcome can be generalized for the intended patient population. The data samples should uniformly cover the patient’s demography, clinical history, age, gender, ethnicity, etc. The algorithm results may be skewed if a demographic is underrepresented in the training data set.
In addition to the aforementioned information, it is also important to discuss the limitations of planned data sets, the risks posed by bias in data sets, the identification of circumstances when models can underperform, and the mitigation of risks such as overfitting in order to demonstrate a comprehensive understanding of clinically relevant safety and effectiveness goals.
Consent and Privacy: A manufacturer must specify policies on patient consent for data use, data confidentiality, and authorized access to data. Disclosure of sensitive patient information violates privacy regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).
Models: For the production pipeline, validation criteria for trained models differ from those for the inference pipeline, as training models are often the only ones used for inference. Given the availability of proven AI/ML libraries such as TensorFlow, PyTorch, and Keras, which are available for solution development, developers choose to re-use them to expedite their research. Solution developers need to verify these SOUP components as per IEC 62304 requirements for inference models. They need to do research on existing anomalies and justify their acceptance.
Libraries and code used purely for pre-processing the data or training the model do not constitute part of the medical device, nevertheless, they need to be validated under ISO 13485.
The auditors may not be concerned with library selection, but the demonstration of the overall solution's performance as intended and the delivery of expected output for a specified input is the most significant component of the solution.
The pre-processing steps must be documented to ensure that the original data is not altered in a way that impacts the model's outcome.
The procedures and checks for data labeling must be specified in the models that execute supervised learning. It is advised that SMEs train the team and audit the labeled data. In order for the algorithm to be trained and the model to be accurate, the competency of annotation teams is essential.
Validation: The performance of the model on test data sets demonstrates the model's reliability. It is advised that there is no overlap or correlation between the training and test data sets and they should ideally come from independent sources. The training and test data of the sample should have enough variation to prevent skewing and to ensure that the data is sufficiently generic.
A clinical evaluation must be conducted on a population that is comparable to the target population for the solution, with as many participants as possible. Generalization in the clinical context is ensured by the performance of test results during the acceptance study.
Adaptive Algorithms: Adaptive algorithms pose unique challenges as the medical device in production can change its output based on new learnings obtained in the field. The learning must be controlled in such a way that the risks of modification are minimized. The discussion paper from the US FDA proposes a pre-determined change control plan.
Developers of SaMD or DTx solutions can record predicted adjustments of learning relevant to intended usage in SaMD pre-specifications (SPS). Consequently, prospective algorithmic changes are identified and recorded during the initial release of the solution, allowing for the mitigation of risks through the application of an appropriate review strategy. Any such changes would not require any modification approval from the regulatory bodies.
The Algorithm Change Protocol (ACP) specifies strategies for mitigating the risks of anticipated modifications. Data management, re-training objectives, performance evaluation, and revised processes for algorithm change controls are all mentioned in ACP. There may be changes in the solution beyond the approved SPS and ACP and the approval pathways for such cases shall depend on the class of the device and the kind of modifications made.
Real-world Performance Monitoring: Monitoring SaMD in real-life clinical settings and with the intended population is essential for evaluating the trained model’s performance.
Moreover, the post-market surveillance (PMS) plan must identify the key data to be collected and mechanisms to calculate KPIs in clinical conditions and report any potentially hazardous situations. Incorporating the appropriate modules for automated data collection, analysis, and periodic reporting into the solution ensures prompt data gathering and effective algorithmic adaptation in real-world clinical settings.
The good practices discussed above have been summarised and depicted in Figure 3 below.
For AI-enabled medical devices to be quickly approved, the developers must offer a compelling case to the auditors. It is also critical to demonstrate the models' reliability by proving quantitative performance requirements such as specificity and sensitivity for classification problems and mean absolute error and mean square error for regression tasks.
Furthermore, while AI/ML models may not claim 100% accuracy, the solutions should address the risks of inaccurate outputs by documenting the potential outcomes and procedures to be followed to limit the risks associated with inaccuracies in prediction functions. Adjusting the model to minimize missed detections is preferable, even at the expense of false positives. Similarly, the risks associated with wrong annotation and model bias need to be addressed during the development cycle. Procedures to identify such incidents and corrective actions must be documented.
AI algorithms are generally regarded as black-box solutions where the logic of the model is difficult to explain. To build confidence, developers must ensure that consumers are aware of the risks associated with AI, such as inaccurate predictions, learning over time, and model modification. Moreover, open communication of details such as the data used for training and the level of confidence in the outcome is critical to establishing trust. In the case of adaptive algorithms, the user should be notified of any changes, and the SOP for reverting the change should be provided.
For AI-based medical devices, these practices can help present a convincing case to auditors and accelerate market entry.
While the standards to specifically address the application of AI in medical devices have not been rolled out, solution developers must adopt the guidelines early in the development cycle. It's easier for auditors to understand the risks and benefits of the solution if the decisions, risk-based designs, testing strategies, and evidence of the performance are well justified. Research on available libraries and their anomalies shall allow developers to reuse existing libraries instead of reinventing the wheel, thus optimizing overall delivery time and efforts. Lastly, team competency is crucial for a robust and safe solution.
Tata Elxsi as a commercialization partner
Tata Elxsi has updated its quality management systems to counter the challenges of developing safe and reliable AI/ML-enabled solutions. To assist our delivery teams, we have created a variety of resources such as data set traceability templates, design document templates, standards of practice for taking consent and collecting data, delivery checklists, etc. Furthermore, our solutions ensure critical and timely regulatory compliance, thus allowing developers to focus on other important aspects such as accessing new markets, expanding product lines, and improving customer experience and satisfaction.