Fraud Auditing, Detection, and Prevention Blog

Fraud Data Analytics: The Process of Selecting Transactions for Fraud Testing

May 21, 2019 8:00:27 AM / by Leonard W. Vona

In our first blog in a series on fraud data analytics, we identified a ten-step methodology for conducting a fraud data analytics project. In this blog, we will discuss steps seven and eight:

  • What filtering techniques are necessary to refine the sample selection process?
  • What is the basis of the sample selection?


Fraud data analytics is not about identifying fraud but rather, identifying red flags in transactions that require an auditor to examine and formulate a decision. The distinction between identifying transactions and examining the transaction is important to understand. Fraud data analytics is about creating a sample; the audit program is about gathering evidence to support a conclusion regarding the transaction. The final question in the fraud audit process should be: Is there credible evidence that a fraud risk statement is occurring? The next blog will discuss the importance of the fraud audit procedure.

What filtering techniques are necessary to refine the sample selection process?

Creating and filtering homogenous data files may sound like the same process but in fact are very different. Creating the homogenous data file is the process of normalizing the data file so that all the transactions have a commonality before the data interrogation routines. Filtering is the process of reducing the homogeneous data set based on materiality, overt data errors and frequency considerations.  In the filtering stage, all the transactions in the homogeneous data set meet the testing criteria but in the fraud auditor’s judgment all the selected transactions should not be included in the final sample for fraud testing.

Filtering based on materiality

It is important to be aware of the consequences of premature data filtering. For instance, if you exclude transactions because of small dollar size, you may miss a pattern and frequency of transactions that in the aggregate are large. I believe that fraud schemes generally occur in certain dollar ranges, i.e. false billing schemes using a shell company tend to be in the bottom third of the spend level.

Filtering because the data is not perfect

Data will have errors, and those errors can create a false positive. For instance, consider what can happen when looking at an invoice table and the invoice date in using a first record and last record search routine. If a December 31, 2018 invoice date is entered as December 31, 2017, then the record will be considered the first record rather the last record. In the filtering process, the auditor needs to make judgments regarding how to manage false positives due to data integrity.

Filtering based on frequency linked to materiality

The frequency consideration is dependent on the fraud data analytics strategy and the materiality question. Suppose a vendor has a sequential pattern of invoice numbers, 100-103 but the dollar aggregate is $250.00. With no other criteria, I would not include the vendor in my sample. However, if I have two invoices with the same date, same line item description and in the aggregate, the two invoices exceed a key control level, then I would include the vendor in my sample.

The filtering process is the final step before selecting a sample. It should be considered before the data interrogation phase to avoid the appearance of sample size shopping.

What is the basis of the sample selection?

The final product of the fraud data analytics process is the selection of transactions or entities for audit examination. As a reminder from the previous blogs, the sample selection is impacted by the fraud data analytics strategy, sophistication of concealment and the precision of matching.

So, step eight is the accumulation of steps one seven. It is the final step of the fraud data analytics. It must be understood and consider before the data interrogation. There is an old saying: “Be careful what you wish for, you will receive it.”  If you get things that you desire, there may be unforeseen and unpleasant consequences. If the sample size is too large, there may be too many false positives and you may miss the fraudulent transactions. To avoid this undesirable outcome, the sample selection criteria must be carefully considered. The most common reasons for a large sample size or too many false positives are:

  • Improperly written fraud risk statement.
  • Relying on one testing criteria.


There once was a certain a CAE who complained about the number of false positives that occurred in the search for ghost employees. The data interrogation searched for duplicate bank account numbers with a different name.  The sample size was 5,000 employees. So, what happened?

The overriding failure of this test was not understanding that husbands and wives could  both work for the corporation but not in the same department. It appears that the audit team didn’t understand the business environment. The second aspect, the search routine, did not identify the “person committing” component of the fraud risk statement. Were they searching for a person with direct access (Payroll Function) or indirect access (Department head)? Let’s assume the fraud risk statement is focusing on a department head versus payroll.

There was only one homogenous data set and the sample size criteria didn’t have enough attributes. If a second criteria had been added, such as same bank account, same department, the sample size would have been smaller, and there would likely have been fewer false positives.

So, ask yourself, if you were committing a ghost employee scheme, would you want the ghost employee in a department that you managed, or in a department that you do not control? I would have added to the test duplicate bank account, same or different name and same cost center.

I would be remiss if I did not mention the false negative in sample selection. A false negative is when your fraud data analytics does not find a fraudulent transaction. The number one reason for false negatives is the failure of not calibrating your fraud data analytics plan for the sophistication of concealment.

At the risk of repeating myself: Even the best auditor in the world and using the world’s best audit program cannot detect fraud unless their sample includes a fraudulent transaction. This is why fraud data analytics is essential to the auditing profession.

I believe that using the methodology described today and the previous blogs will provide the fraud auditor the opportunity to detect fraud in core business systems.

Sign up now to have this blog delivered to your inbox and read the rest of the series.

Demystifying Fraud eBook CTA

At Fraud Auditing Inc. we have over 39 years of diversified experience. Contact us today if you need help building a comprehensive fraud audit program to detect complex fraud schemes.

Topics: Fraud Data Analytics, Fraud Auditing, Fraud Detection, Fraud Definitions

Leonard W. Vona

Written by Leonard W. Vona

Leonard W. Vona has more than 38 years of diversified auditing and forensic accounting experience. His firm, Fraud Auditing, Inc., advises clients in areas of litigation support, financial investigations, and fraud prevention.