Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Credit Card Fraud Detection out of curiousity

  1. #1
    Join Date
    Dec 2008
    Posts
    22

    Default Credit Card Fraud Detection out of curiousity

    Hi to my fellow members,

    I am very curious of this issue whether machine learning will still be applicable in this scenario.

    Objective:
    Let's say one wanted to develop weka model that will detect whether a Credit Card transaction is Fraud / Malicious or VALID. Therefore basically what we are going to do might get probably around 1,000 Sample transaction of Credit Card Fraud and 20,000 Vaild transactions. Bingo we guess But!!!

    Challenge:
    The machine learning that we developed based from our objective earlier MIGHT not be applicable given that there are millions of transactions of credit card. Therefore, our data might not be useful to represent a model to detect fraud. Further more, each Credit Card Holders are unique so it will be imposible to predict whether a transaction is valid or not. Aggregating the data may not be an option since we wanted to detect Fraud in PER TRANSACTION basis.

    Question:
    How then we solve this problem? Is that mean that we had reached probably the limit of machine learning?


    GUYS what do you say? I am really serious learning WEKA and fielded application. I have not found this problem in the Data Minning Book as much as I have read.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi perryrico,

    Credit card fraud is a classic application for machine learning/data mining. As far as I know, it is tackled - from a prediction standpoint - by building up a profile of credit card usage from separate transactions for a given card. So, at some point, the probability of fraudulent usage for a given card will exceed a predetermined threshold. For example, lots of small to medium sized cash withdrawals from a credit card from different locations in a short period of time could indicate a stolen card and fraudulent usage. So, the main task is to define a good set of attributes to describe fraud/non-fraud usage. Some of these will be aggregates over time as in my example.

    Cheers,
    Mark.

  3. #3
    Join Date
    Dec 2008
    Posts
    22

    Smile

    Quote Originally Posted by Mark View Post
    Hi perryrico,

    Credit card fraud is a classic application for machine learning/data mining. As far as I know, it is tackled - from a prediction standpoint - by building up a profile of credit card usage from separate transactions for a given card. So, at some point, the probability of fraudulent usage for a given card will exceed a predetermined threshold. For example, lots of small to medium sized cash withdrawals from a credit card from different locations in a short period of time could indicate a stolen card and fraudulent usage. So, the main task is to define a good set of attributes to describe fraud/non-fraud usage. Some of these will be aggregates over time as in my example.

    Cheers,
    Mark.

    I am following the picture. But is that mean that Fraud will only be detected once each card holder experiences Fraud? That is because there will be NO FRAUD CLASS IN THE DATASET available unless a card holder experiences a fraud? Therefore we can not build model unless all card holde experiences fraud? Pardon me by being so skeptical--I really wanted to become PRO and ACE in weka like you.

  4. #4
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    I'm not sure I understand. Are you wanting to construct a model for each separate card-holder? If so, this sounds a bit odd. The idea is to generalize - so we take examples of fraudulent usage patterns (or at least the transactions that occurred when a card was being used fraudulently) and non-fraudulent usage patterns. The basic assumption is that there are patterns to fraudulent behavior that hold in general for all cases of fraud (regardless of whether any one card holder has experiences fraud or not). We can then build a model to predict the likelihood of fraud given data about the current set of transactions for a given card. Of course, it is unlikely that the system will detect the very first fraudulent transaction for a card, but further transactions will add to the profile and increase the likelihood of fraud with respect to the model.

    Cheers,
    Mark.

  5. #5
    Join Date
    Dec 2008
    Posts
    22

    Default

    Yes mark that was I what thinking. I get your point. that means that we will not be able to detect fraud at the very first transaction--therefore we have to find patters using the series of transaction commited as fraud. I thought earlier it is feasible to detect the fraud at the very first transaction. Thanks for your advices.

  6. #6
    Join Date
    Jul 2009
    Posts
    7

    Default

    About ten years ago I did work on cell phone fraud detection, which is fairly similar to this. You might be interested in the approach we took. It's described here:

    http://home.comcast.net/~tom.fawcett/public_html/papers/DMKD-97.ps.gz


    I wouldn't say it's impossible to catch fraud at the very first fraudulent transaction. You're building a type of anomaly detection system; there is always a natural trade-off in such systems between speed of detection and number of false positives. You can catch fraud very quickly if the behavior is sufficiently atypical, or if you're willing to tolerate many false positives.

    -Tom
    Last edited by tfawcett; 07-29-2009 at 05:18 PM.

  7. #7
    Join Date
    Dec 2008
    Posts
    22

    Default

    Quote Originally Posted by tfawcett View Post
    About ten years ago I did work on cell phone fraud detection, which is fairly similar to this. You might be interested in the approach we took. It's described here:

    http://home.comcast.net/~tom.fawcett/public_html/papers/dmkd-97.ps.gz

    I wouldn't say it's impossible to catch fraud at the very first fraudulent transaction. You're building a type of anomaly detection system; there is always a natural trade-off in such systems between speed of detection and number of false positives. You can catch fraud very quickly if the behavior is sufficiently atypical, or if you're willing to tolerate many false positives.

    -Tom
    Hi Tom, the links seems no longer exist. Do you have alternative link?

  8. #8
    Join Date
    Jul 2009
    Posts
    7

  9. #9
    Join Date
    Dec 2008
    Posts
    22

    Default

    Quote Originally Posted by tfawcett View Post

    Thanks buddy.

  10. #10
    Join Date
    Oct 2016
    Posts
    1

    Default

    Quote Originally Posted by tfawcett View Post
    Hi Tom, do you have alternative link?

    Thank you very much.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.