Skip to main content

by Daniele Micci-Barreca

Detecting Fraud Schemes

A Fraud Scheme Scenario

A large online retailer notices an uptick in orders for high-end products from newly established accounts. Management is thrilled although nobody seems to have an explanation for the improved performance in this niche segment. About thirty days after the increase in sales began, the billing department starts to notice that some of those large dollar transactions are being disputed by the cardholder and charged back by the bank. Soon, more chargebacks arrive and it becomes clear that something is not right: looks like that spike in sales, which amounted to over $150,000, turned out to be due to a sequence of fraudulent purchases. However, the goods have been shipped and delivered, at different locations around the country, and the loss is pretty much unrecoverable.

A post-mortem of this “fraud scheme” event indicated that the existing fraud detections system, which screens every transaction in real-time as it is processed by the ordering system, failed to detect what appeared later as a pretty evident pattern. Each transaction appeared legitimate because the credit card information matched, none of the customer data matched existing blacklists and the orders included mostly a single product. Also, the IP address geolocation matched the delivery area. But what the system could not detect was the fact that many of the orders were related. They were placed within the same timeframe, they were all shipped to a few delivery addresses, and the credit cards used, albeit different, where all pre-paid credit card accounts from a known issuer of reloadable cards. It also turned out that the phone number associated with these new accounts was the same, albeit the customer information was different. Finally, it turned out that the weblogs revealed that the same ‘cookie’ was associated with most of the fraudulent sessions, therefore the same computer was likely used to place the string of fraudulent orders.

This example is just one of many where a business is caught unprepared by fraud scheme which causes significant damage before it can be even detected. In the era of extreme automation, high transaction volumes, and a highly connected world, threats can hit virtually any business:

  • Insurance companies paying billions of dollars in claims
  • E-commerce operations transacting with customers located anywhere in the globe
  • Financial institutions monitoring activities for potential account compromises
  • Gaming and hospitality organizations monitoring cash movements for anti-money laundering compliance
  • Restaurants chains looking to spot cashier fraud
  • Government agencies managing tax refunds, unemployment claims and disability claims.

These are just a few examples of situations where a business is dealing with large volume of transactions and is exposed to what we refer to as bad actors that are responsible for the potentially large volume of fraudulent transactions. Individual transactions could be even small and often “pass under the radar”, or lack the clues to make them detectable by transaction-based fraud screening systems. Sometimes such systems can identify a portion of the fraudulent scheme, but fail to “connect the dots” and alert the business about the overarching pattern.

From Transactions to Patterns

To protect your business from fraud schemes and bad actors before they can cause significant damage, you need two fundamental capabilities. First, relate transactions to each other, whether they are claims, tax returns, online purchases, money transfers, etc. Second, the ability to evaluate which groups of related transactions are indeed suspicious and should trigger further investigation.

The bad actors are unlikely detectable by explicit clues, like a customer account, or a payment method, because fraudsters are very savvy in avoiding such obvious clues. They can easily create multiple accounts and use hundreds of diverse financial accounts (both for paying and receiving money). For example, fraudulent tax refund claims are typically routed to dozens of pre-paid credit card accounts, which are quite anonymous, available and practical. Also in taxation, many agencies are looking for what is commonly referred to as “ghost tax preparers” – these are either licensed or unlicensed tax preparers who avoid “signing” tax returns because they utilize “aggressive” or undeniably fraudulent schemes to inflate refund amounts. These bad actors cannot be identified by the name of the preparer on the tax return, which will be absent, but they can be linked by a variety of other pieces of data, like email, IP address, device ID or phone number.

Unfortunately, it is difficult to determine which piece of data will be useful to detect the next scheme, which clues the fraudsters will not cover which will be helpful in linking the next string of fraudulent orders. So, what is the solution? Creating a multi-link framework which is continuously looking for potentially suspicious groups of transactions that are related by one or multiple data elements. All link criteria shall be evaluated at the same time, to make sure that no potential pattern will be missed. For example, the email domain name of the transaction recipient could serve as a clue in certain situations. While many popular domain names are obviously too generic to serve as a clue, in many occasions we have seen schemes where fraudsters where utilizing newly created email accounts from free email platforms, which were easy to exploit by automatically generating hundreds of accounts.

There are dozens of data elements that can be used to create groups of transaction that can lead to the underlying “bad actors”. Some are common to many organizations that transact over the Internet, including:

  • IP address
  • Device ID (e.g. browser signature)
  • E-mail
  • Cookies
  • Phone number
  • Address

Other link criteria may be specific to the domain, such as information about payment or deposit accounts, including credit card, card PIN number, bank routing number, account number. Finally, individual transaction elements which are domain-specific could be used as link criteria themselves. For example, product codes, discount codes, patient identifiers, procedure codes, payment terminal codes, and more can be also used as potential linking criteria. In some cases, some information can be derived from a raw data element and used for linking – for example, an IP address can be geo-mapped to a metro area, and the metro area code can be used as a linkage criterion.

Once groups of potentially related transactions have been identified, the challenge is to determine which ones are truly significant, or potentially suspicious. A multi-link process can identify hundreds or even thousands of groups of transactions or clusters. The challenge is to automatically identify which of these are indeed significant and should be researched. Fortunately, Data Science comes to help with this second, yet very important step. Using advanced anomaly detection principles, one can determine, based on the attributes of the transactions of a group, whether there are unusual similarities, or dissimilarities, across the transactions. Without making specific assumptions about what is to be considered “normal” or “abnormal”, one can determine whether a group of related transactions appears to be particularly self-similar compared to the general population: we call this process cluster scoring.

The power or cluster scoring is that evaluates each individual transaction not based on its own characteristics alone, as it is done in transactional scoring, but also based on the characteristics of related transactions. For example, the amount of an individual transaction of $59 may not appear suspicious in and of itself. However, let’s assume that we can identify another 25 transactions that came from the very same IP address, and we notice that each and every one of them, or most of them, are also for $59. In this case, the amount of the transaction itself, when put into the context of the other related transaction, becomes a very important clue that something is unusual with regard to this particular set of transactions. Vice versa, another group of 50 transactions coming from the same IP address, but featuring a variety of transaction amounts, would not be deemed as interesting. Cluster scoring uses a vast portfolio of advanced metrics to automatically identify anomalous groups of transactions.

As stated earlier, bad actors are often identified by implicit clues, like a device ID, an IP address or an email pattern, rather than explicit ones. In other words, we can determine that a fraudulent scheme is behind a suspect transactions, and we can stop further transactions based on those clues, but determining “who” is behind the scheme takes more work. For this reason, it is useful to be able to relate groups of suspicious transactions to each other, and therefore being able to actually “see the forest for the trees”. Perhaps, we identify a half-dozen bank accounts related to an equivalent number of suspicious clusters. But, are these clusters, and bank accounts, related to the same “hand”? By analyzing linkages across these groups of transactions, perhaps the same email address that appears in transactions belonging two different bank account “groups” can reveal that those groups are indeed related. By utilizing automated link analysis algorithms, one can, therefore, uncover broader schemes and raise the level of visibility, awareness, and preparedness to face the most sophisticated and challenging fraud schemes.