Deep Generative Models in Credit Scoring

Andrade Mancisidor, Rogelio

dc.contributor.advisor	Jenssen, Robert
dc.contributor.author	Andrade Mancisidor, Rogelio
dc.date.accessioned	2021-01-23T09:48:44Z
dc.date.available	2021-01-23T09:48:44Z
dc.date.issued	2021-02-05
dc.description.abstract	Banks need to develop effective credit scoring models to better understand the relationship between customer information and the customer's ability to repay the loan. The output of such a model is called the default probability and is used to rank loan applications in terms of their creditworthiness. The focus of this thesis is to develop novel credit scoring methodologies that solve well-known problems in the field and that bridge the gap between simple neural networks and advanced methodologies in deep learning applied to credit scoring. We propose a new methodology to learn useful data representations of bank customers introducing a supervision stage, where we group the input data using the Weight of Evidence transformation, into the Variational Autoencoder framework. Our proposed method learns data representations that are able to capture the customers' creditworthiness in a well-defined clustering structure and that are well suited for marketing campaigns and credit risk assessment. Further, we develop two novel Deep Generative Models that are able to infer the unknown customers' creditworthiness of rejected loan applications using probabilistic theory, which is a clear advantage over traditional approaches. Adding rejected applications improves the classification accuracy of our proposed models, and potentially solves the selection bias problem. We parametrize a Gaussian mixture model with neural networks to further improve the latent representation of customers information. Finally, we address credit scoring as a multi-modal learning problem. That is, banks have multiple measurement-modalities that provide complementary information about customers. Hence, we develop a novel Deep Generative Model that learns shared data representations and maximizes mutual information between future credit data and its shared representation. Our proposed model is able to generate future credit data, based on application data, which can be used to support bank activities other than credit scoring.	en_US
dc.description.doctoraltype	ph.d.	en_US
dc.description.popularabstract	Banks need to develop effective credit scoring models to better understand the relationship between customer information and the customer's ability to repay a loan. The output of such a model is called the default probability and is used to rank loan applications in terms of their creditworthiness. The focus of this thesis is to develop novel credit scoring methodologies that solve well-known problems in the field and that bridge the gap between simple neural networks and advanced methodologies in deep learning applied to credit scoring. Deep learning is a system built of a cascade of trainable modules, where we train all modules simultaneously and each of the modules adjust itself to produce the right answer, which in the case of credit scoring is an accurate estimate of creditworthiness. We propose a new methodology to learn a useful way to transform customer information into a new data representation that is capable to capture customer creditworthiness in a well-defined clustering structure. These clusters are unknown a priori and are impossible to identified using the original customer information. Further, the clusters that we identified with our proposed method are suitable for marketing campaigns and credit risk assessment. Banks do not know the creditworthiness of people who apply for loans and are rejected. Therefore, we develop models that can infer the unknown customer’s creditworthiness for rejected applications using probabilistic theory. Our experiments show that adding rejected applications improve the creditworthiness estimation for all loan applications. Banks have access to multiple sources of information about their customers. For example, historical information obtained on application forms and behavior data collected during the loan period. These different sources of information are called data modalities. We develop a novel methodology that can learn a shared data representation for these two data modalities. Furthermore, our proposed method is capable of generating a missing modality using the information of the available modality. That is, we can generate the data of the future behavior of a certain person using the information collected during the application process.	en_US
dc.description.sponsorship	Nærings-phd (grant 260205), Santander Consumer Bank - Nordics, SkatteFUNN (grant 276428)	en_US
dc.identifier.uri	https://hdl.handle.net/10037/20407
dc.language.iso	eng	en_US
dc.publisher	UiT Norges arktiske universitet	en_US
dc.publisher	UiT The Arctic University of Norway	en_US
dc.relation.haspart	<p>Paper I: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. (2021). Learning latent representations of bank customers with the Variational Autoencoder. <i>Expert Systems with Applications, 164</i>, 114020. Also available at <a href= https://doi.org/10.1016/j.eswa.2020.114020> https://doi.org/10.1016/j.eswa.2020.114020</a>. <p>Paper II: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. (2020). Deep generative models for reject inference in credit scoring. <i>Knowledge-Based Systems, 196</i>, 105758. Also available at <a href=https://doi.org/10.1016/j.knosys.2020.105758> https://doi.org/10.1016/j.knosys.2020.105758</a>. <p>Paper III: Mancisidor, R.A., Kampffmeyer, M., Aas, K. & Jenssen, R. Generating Customer’s Credit Behavior with Deep Generative Models. (Manuscript).	en_US
dc.relation.isbasedon	<p>Three real data sets provided by Santander Consumer Bank - Nordics. <ol> <li>Give me some credit data set, available at <a href=https://www.kaggle.com/c/GiveMeSomeCredit>https://www.kaggle.com/c/GiveMeSomeCredit</a>.</li> <li>Lending Club data set, available at <a href=https://github.com/nateGeorge/preprocess_lending_club_data>https://github.com/nateGeorge/preprocess_lending_club_data</a>.</li> <li>Public data for Santander Bank, available at <a href=https://www.kaggle.com/c/santander-customer-transaction-prediction/data>https://www.kaggle.com/c/santander-customer-transaction-prediction/data</a>.</li> </ol>	en_US
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2021 The Author(s)
dc.subject.courseID	DOKTOR-004
dc.subject	VDP::Mathematics and natural science: 400::Mathematics: 410::Statistics: 412	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Statistikk: 412	en_US
dc.subject	VDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Anvendt matematikk: 413	en_US
dc.subject	VDP::Mathematics and natural science: 400::Information and communication science: 420::Mathematical modeling and numerical methods: 427	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Matematisk modellering og numeriske metoder: 427	en_US
dc.title	Deep Generative Models in Credit Scoring	en_US
dc.type	Doctoral thesis	en_US
dc.type	Doktorgradsavhandling	en_US

Tilhørende fil(er)

Navn:: thesis.pdf
Størrelse:: 6.762Mb
Format:: PDF

Åpne

Navn:: license.txt
Størrelse:: 1.093Kb
Format:: Tekstfil

Åpne

Denne innførselen finnes i følgende samling(er)

Doktorgradsavhandlinger (NT-fak) [291]

Vis enkel innførsel

Deep Generative Models in Credit Scoring

Tilhørende fil(er)

Denne innførselen finnes i følgende samling(er)

Relaterte innførsler

Engineering methods for enhancing railway geometry and winter road assessment: A safety and maintenance perspective ﻿

Geometric Modeling- and Sensor Technology Applications for Engineering Problems ﻿

Iceberg Drift-Trajectory Modelling and Probability Distributions of the Predictions ﻿

Engineering methods for enhancing railway geometry and winter road assessment: A safety and maintenance perspective

Geometric Modeling- and Sensor Technology Applications for Engineering Problems

Iceberg Drift-Trajectory Modelling and Probability Distributions of the Predictions