Data Science & Fraud Detection
Fraud Detection & It's Importance, How Models & Developed and Deployed, Rule Based Fraud Detection
This post was created with the assistance of @BowTiedBrothers. He comes from a strong research background. I highly suggest to go check him out, as he started from academia and recently transitioned to a full time data role.
He also helped me create the popular Interview preparation post.
Premise For This Post:
We’ll be discussing how AI models are developed and used to combat fraud, and some advantages and disadvantages of using these methods.
Table of Contents:
Conclusion
Fraud Detection, & It’s Importance
How Models Are Developed
How Models Are Used
Rule Based Fraud Detection
Some Pros & Cons
1 - Conclusion
Everyone's preferences are going to online transactions. This means there will be more chances for fraud to occur. The use of AI models in fraud detection is going to become essential to combat this fraud.
What still remains is how human roles in the fraud detection process will continue to change. Or if they will be eliminated altogether.
2 - Fraud Detection & It’s Importance
Fraud detection is a process that identifies scams and prevents assets from being stolen. It can be done in a wide variety businesses, but the best examples are within banking and retail.
There are 2 types of frauds:
Monetary: Includes things like payment or transaction fraud
Non-monetary: Things like password reset or change of address request
Fraud detection works by using basic sets of heuristics or rules that get followed. For example: a price limit that denies transactions, or blocking IP addresses. These rules will only provide basic protection. This means that the bulk of the responsibility to be on the consumer. The customer needs to see if a fraudulent charge has been made in their name. If it has, they need to alert the business or bank.
With the massive increase in available customer and transactional data. There has been a push to move away from manual review by individual agents and move to using these ML models.
While they aren’t a complete replacement for humans at this point. They do assist in expediting the detection process. This is essential in being able to keep up with the increased online activity.
3 - How Models Are Developed
There are many machine learning models that can be used in fraud detection. Here’s a few:
Suggested Models For Unsupervised:
PCA (Principal Component Analysis)
SVM (Support-Vector Machines)
Suggested Models For Supervised:
Decision Trees (such as XGBoost)
Logistic Regression
Random Forest
KNN (K-Nearest Neighbors)
The model that gets selected will depend on the company. It will depend on the data they will prefer to use, and how they will use these models.
Next step is to actually train the preferred model or models. It is important to train them on large quantities of relevant data. What data is most important will once again depend on the company and what their business is. But generally speaking, some common factors that are usually taken into account include:
Device type
Network
Location of transaction (IP address)
Change payment limits
View balance
Update email
Update password
Transaction amount
Type of transaction (i.e. specific item)
Time of transaction
Profiles of those involved below:
Once the model is trained, the next challenge is to update it with new, relevant data. Since trends in fraud can change, we need to make sure stays up to date with the latest data. This will ensure it has a high level of accuracy but also suppress false positives.
4 - How Models Are Used
Once implemented, models assist the company by streamlining the process. What the end result will be depends on the company. But, in general the model will either notify the customer of fraudulent activity. Which would provide 24/7 protection since it isn’t dependent on a reviewer. Or it can notify the company of suspicious activity. This will give them the option to control whether a customer is notified.
Good Implementation
Let’s say you only buy about $20 worth of toilet paper, toothpaste, and floss per month on Amazon. And then get it shipped to your house in Duluth, Minnesota.
One day someone steals your credit card info, and tries to buy a Full Size Commercial Grade Seated Racing Arcade Machine. The cost for this is $4,498 and they are planning to have it shipped to New Orleans, Louisiana.
Since:
That's different than what you buy
At a price much higher than you spend
Shipped to a location you've never had anything sent to
This would make this transaction immediately flagged as fraud.
What can then happen is you can receive an alert on your phone asking if you want to approve this transaction. Since it is something you bought, you can decline it. This would be a relevant usage of fraud protection where both the customer and the business are protected. And this also takes the user experience into account, as they can decline/approve at the click of a button.
Poor Implementation
Let’s say this Saturday you break your fishing rod and want to buy a new one online at Company #1. This would be a smaller retail, but as soon as you submit the transaction it gets denied. Then it says you need to call them and speak to a representative.
You try to give them a call but get an answering machine. The machine says that you need to reach out to them during business hours Monday through Friday, 9am - 5pm ET. You now have to wait to be able to make that buy.
Instead, you decide to look elsewhere and find another rod online at Company #2, a larger retailer.
When you place the order, you get a notification from PayPal. It asks if you would like to approve the transaction. You press ‘yes’ and the transaction is approved.
In this scenario, the user experience at Company #1 suffers due to poor implementation. It also suffers due to a lack of customer support to hurry the fraud review process. As a result you took your business elsewhere, which wouldn’t be an uncommon occurrence if it happened to customers.
Note: I can confirm this is true, I dropped a bank because I got tired of their shit.
Key Takeaway
Due to having more capital and more available transactional and customer data. Larger businesses have the ability to build more models and offer more support than a smaller ones.
If a larger company hasn’t transitioned their infrastructure to the 21st century yet. This will be a massive undertaking for them to be able to provide adequate service in this domain. Especially as they migrate over from their legacy systems.
Smaller companies will have access to other companies that offer fraud detection service. This will allow them to circumvent the need to make their models in-house. Although there are many pros and cons to that as well. Some of them are: depending on the service they use and whether they have control over the model.
5 - Rule Based Fraud Detection
Rule based fraud detection is a technique used by companies to detect fraud behavior. The company defines rules that identify the types of activities or transactions that are likely to be fraud. Then it uses software to scan all transactions for any that match the rules.
This is a more sophisticated approach than going through all the transactions by hand. By using rules, the company can detect much more subtle indicators of fraud and prevent it from happening in the first place. Vid on this below:
6 - Some Pros & Cons
Advantages:
Reduced number of verification measures
Improving user experience
Automatic detection
Massive reduction in manual work for human reviewers
Good at finding hidden correlations in data that may not be obvious cases of fraud
Real-time processing
Disadvantages:
The success of the model relies entirely on the quality and quantity of data you train it on:
It will be impossible to incorporate the human element in a training set. The reason is there are outside variables that may need to be taken into account or that can’t be quantified. In those cases, this is where a manual review comes in handy to compensate for those types of situations.
If a complex model is used it may not be interpretable, i.e. a “blackbox” model.
This would mean that when mistakes are made, it will be harder to notice them and determine why they are happening.
False positives will dramatically hamper user experience if not properly managed
Especially if it forces people to approve every other transaction. Or it makes it more difficult to buy certain things that you may not buy.