Authors: Istiak Ahmed, Kazi Shahriar Sanjid, Md. Tanzim Hossain, Galib Ahmed, Sheikh Anisul Haque, Md. Arifur Rahman, Nasir M. Uddin, and M. Monir Uddin
Institution: North South University, Friedrich-Alexander University, Bangladesh Specialized Hospitals Limited, University of South Florida
Breast cancer is a major cause of mortality among women globally, making early and precise detection crucial for effective treatment. Mammography is the standard tool for breast cancer screening; however, interpreting mammograms is challenging due to subtle abnormalities and high inter-observer variability. Existing research often focuses on either segmentation of breast structures or classification of clinical features but rarely integrates both in a multimodal framework. Furthermore, interpretability and accessibility of AI tools in rural areas and medical camps remain underexplored. This study addresses these gaps by developing an integrated system that combines multimodal early and late fusion techniques for predicting clinical features (e.g., mass presence, mass definition, mass density, calcification type, ACR breast density, BIRADS category) and segmenting key structures (e.g., breast tissue, axilla findings, mass, calcification). The system also incorporates explainable AI (XAI) tools like GRAD-CAM and saliency maps to enhance transparency.
Our methodology integrates advanced deep learning architectures and multimodal fusion strategies to address both segmentation and clinical feature prediction tasks. For segmentation, we employed a U-Net-based architecture with attention mechanisms to accurately delineate key breast structures such as breast tissue, axilla findings, masses, and calcifications. The model was trained using pixel-wise cross-entropy loss, Dice loss, Focal loss, Combined loss and optimized using metrics such as Intersection over Union (IoU), Dice Similarity Coefficient (DSC), precision, recall, F1 score, Average Surface Distance (ASD), Normalized Surface Distance (NSD), Hausdorff Distance (HD), and Relative Absolute Volume Difference (RAVD). To ensure robustness, we augmented the dataset with transformations such as rotation, flipping, and intensity adjustments. For clinical feature prediction, we utilized a multimodal fusion approach combining image embeddings extracted from a customized Multi-Layer Perceptron (MLP) with tabular data features (e.g., patient demographics, imaging metadata). Early fusion concatenated image embeddings with tabular features before feeding them into a fully connected neural network, while late fusion independently processed each modality and combined predictions using weighted averaging. The models were evaluated using accuracy, precision, recall, F1 score, Area Under Curve (AUC), and Receiver Operating Characteristic (ROC) curves. To enhance interpretability, we incorporated Explainable AI (XAI) tools such as Gradient-weighted Class Activation Mapping (GRAD-CAM) and saliency maps. These tools provided visual explanations for both segmentation outputs and clinical feature predictions, enabling radiologists and medical students to understand the rationale behind model decisions.
The proposed system demonstrated exceptional performance across all evaluation metrics. For segmentation tasks, the model achieved high Intersection over Union (IoU) scores exceeding 0.85 and Dice Similarity Coefficient (DSC) values above 0.90 for major structures, including masses, calcifications, and breast tissue. Clinical feature prediction exhibited accuracies surpassing 90% for most categories, with strong precision, recall, and F1 scores. Explainable AI (XAI) tools like GRAD-CAM and saliency maps provided interpretable insights, validating model decisions and enhancing trust. These results underscore the system's robustness and reliability in both segmentation and classification tasks, establishing its potential as a powerful tool for breast cancer diagnostics.
This research bridges the gaps in breast cancer diagnostics by integrating advanced segmentation, clinical feature prediction, and explainability within a single accessible platform. By providing accurate and interpretable insights, the system empowers radiologists and medical professionals, particularly in underserved regions, to make informed decisions, thereby reducing diagnostic errors and improving patient outcomes globally. The incorporation of XAI tools ensures transparency, fostering trust and adoption among healthcare providers. Furthermore, the comprehensive approach of combining multimodal data with interactive visualization enhances diagnostic clarity. Future work will focus on expanding datasets, validating the system across diverse populations, and refining its applicability in real-world clinical workflows to maximize its impact on global healthcare.