Vis enkel innførsel

dc.contributor.advisorJenssen, Robert
dc.contributor.authorAndrade Mancisidor, Rogelio
dc.date.accessioned2021-01-23T09:48:44Z
dc.date.available2021-01-23T09:48:44Z
dc.date.issued2021-02-05
dc.description.abstractBanks need to develop effective credit scoring models to better understand the relationship between customer information and the customer's ability to repay the loan. The output of such a model is called the default probability and is used to rank loan applications in terms of their creditworthiness. The focus of this thesis is to develop novel credit scoring methodologies that solve well-known problems in the field and that bridge the gap between simple neural networks and advanced methodologies in deep learning applied to credit scoring. We propose a new methodology to learn useful data representations of bank customers introducing a supervision stage, where we group the input data using the Weight of Evidence transformation, into the Variational Autoencoder framework. Our proposed method learns data representations that are able to capture the customers' creditworthiness in a well-defined clustering structure and that are well suited for marketing campaigns and credit risk assessment. Further, we develop two novel Deep Generative Models that are able to infer the unknown customers' creditworthiness of rejected loan applications using probabilistic theory, which is a clear advantage over traditional approaches. Adding rejected applications improves the classification accuracy of our proposed models, and potentially solves the selection bias problem. We parametrize a Gaussian mixture model with neural networks to further improve the latent representation of customers information. Finally, we address credit scoring as a multi-modal learning problem. That is, banks have multiple measurement-modalities that provide complementary information about customers. Hence, we develop a novel Deep Generative Model that learns shared data representations and maximizes mutual information between future credit data and its shared representation. Our proposed model is able to generate future credit data, based on application data, which can be used to support bank activities other than credit scoring.en_US
dc.description.doctoraltypeph.d.en_US
dc.description.popularabstractBanks need to develop effective credit scoring models to better understand the relationship between customer information and the customer's ability to repay a loan. The output of such a model is called the default probability and is used to rank loan applications in terms of their creditworthiness. The focus of this thesis is to develop novel credit scoring methodologies that solve well-known problems in the field and that bridge the gap between simple neural networks and advanced methodologies in deep learning applied to credit scoring. Deep learning is a system built of a cascade of trainable modules, where we train all modules simultaneously and each of the modules adjust itself to produce the right answer, which in the case of credit scoring is an accurate estimate of creditworthiness. We propose a new methodology to learn a useful way to transform customer information into a new data representation that is capable to capture customer creditworthiness in a well-defined clustering structure. These clusters are unknown a priori and are impossible to identified using the original customer information. Further, the clusters that we identified with our proposed method are suitable for marketing campaigns and credit risk assessment. Banks do not know the creditworthiness of people who apply for loans and are rejected. Therefore, we develop models that can infer the unknown customer’s creditworthiness for rejected applications using probabilistic theory. Our experiments show that adding rejected applications improve the creditworthiness estimation for all loan applications. Banks have access to multiple sources of information about their customers. For example, historical information obtained on application forms and behavior data collected during the loan period. These different sources of information are called data modalities. We develop a novel methodology that can learn a shared data representation for these two data modalities. Furthermore, our proposed method is capable of generating a missing modality using the information of the available modality. That is, we can generate the data of the future behavior of a certain person using the information collected during the application process.en_US
dc.description.sponsorshipNærings-phd (grant 260205), Santander Consumer Bank - Nordics, SkatteFUNN (grant 276428)en_US
dc.identifier.urihttps://hdl.handle.net/10037/20407
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universiteten_US
dc.publisherUiT The Arctic University of Norwayen_US
dc.relation.haspart<p>Paper I: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. (2021). Learning latent representations of bank customers with the Variational Autoencoder. <i>Expert Systems with Applications, 164</i>, 114020. Also available at <a href= https://doi.org/10.1016/j.eswa.2020.114020> https://doi.org/10.1016/j.eswa.2020.114020</a>. <p>Paper II: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. (2020). Deep generative models for reject inference in credit scoring. <i>Knowledge-Based Systems, 196</i>, 105758. Also available at <a href=https://doi.org/10.1016/j.knosys.2020.105758> https://doi.org/10.1016/j.knosys.2020.105758</a>. <p>Paper III: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. Generating Customer’s Credit Behavior with Deep Generative Models. (Manuscript).en_US
dc.relation.isbasedon<p>Three real data sets provided by Santander Consumer Bank - Nordics. <ol> <li>Give me some credit data set, available at <a href=https://www.kaggle.com/c/GiveMeSomeCredit>https://www.kaggle.com/c/GiveMeSomeCredit</a>.</li> <li>Lending Club data set, available at <a href=https://github.com/nateGeorge/preprocess_lending_club_data>https://github.com/nateGeorge/preprocess_lending_club_data</a>.</li> <li>Public data for Santander Bank, available at <a href=https://www.kaggle.com/c/santander-customer-transaction-prediction/data>https://www.kaggle.com/c/santander-customer-transaction-prediction/data</a>.</li> </ol>en_US
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2021 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)en_US
dc.subjectVDP::Mathematics and natural science: 400::Mathematics: 410::Statistics: 412en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Statistikk: 412en_US
dc.subjectVDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Anvendt matematikk: 413en_US
dc.subjectVDP::Mathematics and natural science: 400::Information and communication science: 420::Mathematical modeling and numerical methods: 427en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Matematisk modellering og numeriske metoder: 427en_US
dc.titleDeep Generative Models in Credit Scoringen_US
dc.typeDoctoral thesisen_US
dc.typeDoktorgradsavhandlingen_US


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)