Vis enkel innførsel

dc.contributor.advisorKampffmeyer, Michael
dc.contributor.authorGautam, Srishti
dc.date.accessioned2024-03-08T12:35:57Z
dc.date.available2024-03-08T12:35:57Z
dc.date.embargoEndDate2029-03-15
dc.date.issued2024-03-15
dc.description.abstract<p>The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. Deep learning models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. It is therefore crucial to foster the creation of artificial intelligence systems that are inherently transparent, trustworthy, and fair. <p>This thesis contributes to this line of research by exploring the interpretability of deep learning through self-explainable models. These models represent a shift towards more transparent systems, offering explanations that are integral to the model's architecture, yielding insights into their decision-making processes. Consequently, this inherent transparency enhances our understanding, thereby providing a mechanism to address the inadvertent learning of biases. <p>To advance the development of self-explainable models, this thesis undertakes a comprehensive analysis of current methodologies. It introduces a novel algorithm designed to enhance the explanation quality of one of the state-of-the art models. In addition, this work proposes a novel self-explainable model that surpasses existing methods by generating explanations through a learned decoder, facilitating end-to-end training, and addressing the prevalent trade-off between explainability and performance. Furthermore, to enhance the accessibility and sustainability of these models, this thesis also introduces a universal methodology to transform any pre-trained black-box model into a self-explainable one without the need for re-training. <p>Through the proposed methodology, this research identifies and counteracts the learning of artifacts -- spurious correlations -- from the data, further emphasizing the need for transparent models. Additionally, this thesis expands its scope to encompass the dimension of fairness for large language models, demonstrating the tendency of these models to reinforce social biases. <p>The results of this research highlight the efficacy of the proposed methodologies, thereby paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable, to facilitate widespread adoption and trust in artificial intelligence technologies.en_US
dc.description.doctoraltypeph.d.en_US
dc.description.popularabstractThe field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. These models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. This thesis contributes to this line of research by exploring the inherent interpretability of deep learning models and the issue of bias detection. Across five papers, we present a series of methodological advances that yield novel insights. Our contributions constitute significant advancements in deep learning, thus paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable.en_US
dc.description.sponsorshipResearch Council of Norway, grant numbers: 315029, 309439, and 303514en_US
dc.identifier.isbn978-82-8236-568-0 trykt
dc.identifier.issn978-82-8236-569-7 pdf
dc.identifier.urihttps://hdl.handle.net/10037/33143
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universiteten_US
dc.publisherUiT The Arctic University of Norwayen_US
dc.relation.haspart<p>Paper I: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2023). This looks more like that: Enhancing self-explaining models by prototypical relevance propagation. <i>Pattern Recognition, 136</i>, 109172. Also available in Munin at <a href=https://hdl.handle.net/10037/27611> https://hdl.handle.net/10037/27611</a>. <p>Paper II: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2022). Demonstrating the risk of imbalanced datasets in chest x-ray image-based diagnostics by prototypical relevance propagation. <i>2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India</i>. Not available in Munin due to publisher’s restrictions. Published version available at <a href=https://doi.org/10.1109/ISBI52829.2022.9761651>https://doi.org/10.1109/ISBI52829.2022.9761651</a>. <p>Paper III: Gautam, S., Boubekki, A., Hansen, S., Salahuddin, S., Jenssen, R., Höhne, M. & Kampffmeyer, M. (2022). ProtoVAE: A trustworthy self-explainable prototypical variational model. <i>Advances in Neural Information Processing Systems, 35</i>, 17940–17952. Also available at <a href=https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html>https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html</a>. <p>Paper IV: Gautam, S., Boubekki, S., Höhne, M. & Kampffmeyer, M.C. Prototypical Self-Explainable Models Without Re-training. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2312.07822>https://doi.org/10.48550/arXiv.2312.07822</a>. <p>Paper V: Liu, Y., Gautam, S., Ma, J. & Lakkaraju, H. Investigating the Fairness of Large Language Models for Predictions on Tabular Data. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2310.14607>https://doi.org/10.48550/arXiv.2310.14607</a>.en_US
dc.relation.isbasedonMNIST: Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. <i>IEEE Signal Processing Magazine, 29</i>(6), 141–142, available at <a href=https://doi.org/10.1109/MSP.2012.2211477>https://doi.org/10.1109/MSP.2012.2211477</a>.en_US
dc.relation.isbasedonFashion-MNIST: Xiao, H., Rasul, K. & Vollgraf, R. (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Available on arXiv at <a href=https://doi.org/10.48550/arXiv.1708.07747>https://doi.org/10.48550/arXiv.1708.07747</a> and on Github at <a href=https://github.com/zalandoresearch/fashion-mnist>https://github.com/zalandoresearch/fashion-mnist</a>.en_US
dc.relation.isbasedonSVHN: Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A.Y. (2011). Reading Digits in Natural Images with Unsupervised Feature Learning. <i>NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011</i>. Available at <a href=http://ufldl.stanford.edu/housenumbers/>http://ufldl.stanford.edu/housenumbers/</a>.en_US
dc.relation.isbasedonSTL-10: Coates, A., Lee, H. & Ng, A.Y. (2011). An Analysis of Single Layer Networks in Unsupervised Feature Learning. <i>AISTATS, 2011</i>. Available at <a href=https://cs.stanford.edu/~acoates/stl10/>https://cs.stanford.edu/~acoates/stl10/</a>.en_US
dc.relation.isbasedonCIFAR-10: Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Available at <a href=https://www.cs.toronto.edu/~kriz/cifar.html>https://www.cs.toronto.edu/~kriz/cifar.html</a>.en_US
dc.relation.isbasedonCelebA: Liu, Z., Luo, P., Wang, X. & Tang, X. (2015). Deep Learning Face Attributes in the Wild. <i>Proceedings of International Conference on Computer Vision (ICCV), December, 2015</i>. Available at <a href=https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html>https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html</a>.en_US
dc.relation.isbasedonCUB-200: Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. (2011). Caltech-UCSD Birds-200-2011 (CUB-200-2011). California Institute of Technology, 2011. Available at <a href=https://www.vision.caltech.edu/datasets/cub_200_2011/> https://www.vision.caltech.edu/datasets/cub_200_2011/</a>.en_US
dc.relation.isbasedonLISA Traffic Sign Dataset: Møgelmose, A., Trivedi, M.M. & Moeslund, T.B. (2012). Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. <i>IEEE Transactions on Intelligent Transportation Systems</i>, 2012. Available at <a href=http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html> http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html</a>.en_US
dc.relation.isbasedonUCI Adult Data: Becker, B. & Kohavi,R. (1996). Adult. <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5XW20>https://doi.org/10.24432/C5XW20</a>.en_US
dc.relation.isbasedonUCI German Credit Data: Hofmann, H. (1994). Statlog (German Credit Data). <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5NC77> https://doi.org/10.24432/C5NC77</a>.en_US
dc.relation.isbasedonCOMPAS Recidivism Risk Score Data and Analysis: <i>ProPublica</i>. Available at <a href=https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis>https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis</a>.en_US
dc.rights.accessRightsembargoedAccessen_US
dc.rights.holderCopyright 2024 The Author(s)
dc.subject.courseIDDOKTOR-004
dc.subjectDeep learningen_US
dc.subjectExplainable AIen_US
dc.subjectSelf-Explainable Modelsen_US
dc.subjectArtifact Detectionen_US
dc.subjectFairness in LLMsen_US
dc.titleTowards Interpretable, Trustworthy and Reliable AIen_US
dc.typeDoctoral thesisen_US
dc.typeDoktorgradsavhandlingen_US


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel