In our first blog in a series on fraud data analytics, we identified a ten-step methodology for conducting a fraud data analytics project. In this blog, we will discuss step six: What are the steps to designing a fraud data analytics search routine?
In my third book, I compared fraud data analytics to building a house. The fraud risk statements are the foundation, and the fraud data analytics plan is the blue print. Step six is the building process based on that blue print.
In step six, we are providing the programmer with the system design specifications for writing the code to interrogate the data. The following eight steps are necessary to build the data interrogation routines:
- 1. Identify the components of the fraud risk statement—the person committing, the type of entity, and the action statement.
- 2. Identify the data that relates to the fraud risk statement.
- 3. Select the strategy consistent with the scope of the audit, sophistication of concealment, degree of accuracy (exact, close or related), and the nature of the test.
- 4. Based on the data availability, reliability, and usability, clean the data set for overt errors.
- 5. Identify the logical errors that will occur with the test.
- 6. Create your homogeneous data sets using the using inclusion/exclusion theory.
- 7. Establish the primary selection criteria, followed by the remaining selection criteria.
- 8. Create the test using the programing routines to identify all entities or transactions that meet the testing criteria.
Data interrogation steps one to four are discussed in the previous blogs found here and here . Therefore, this blog will focus on data interrogation steps five to eight. I need to stress, jumping directly to step eight is a recipe for disaster.
Step five: Identify Logical Errors
Data is not perfect. There are anomalies caused by many factors; the goal of this step is to anticipate the type of errors that will occur. The plan should determine if the false positives can be minimized through the data interrogation routine or whether the auditor will need to resolve the false positive through document examination.
Logical errors will occur because of such things as input error or data integrity, as well as the method of input by different employees. Additionally, the day-to-day pressure to process a transaction may cause short cuts. However, these factors will create false positives. If you do not have a plan for resolving false positives at the programming stage, then the field auditor must allocate time to hunting them down rather than finding evidence of the fraud risk statement.
Step six: Create Homogenous Data Sets
The inclusion/exclusion theory is critical to building the fraud data analytics plan. In this theory, you include the data that are consistent with the fraud data profile and exclude the data that are not consistent with the fraud data profile. The theory is used to shrink the haystack so to speak. Whether or not the fraud auditor actually creates separate files is a matter of style, whereas the concept of inclusion exclusion is necessary in identifying anomalies.
The importance of the inclusion and exclusion step varies by the nature of the inherent fraud scheme, the fraud data analytics strategy and the size of the data file.
Let’s assume the vendor master file has 50,000 vendors, including 5,000 that are inactive. The first homogenous data set would be only active vendors. The fraud risk statement is a shell company created by an internal source. The data interrogation procedure focuses on missing data as the primary selection criteria. This test identifies 100 vendors meeting the search criteria.
The transaction file contains a million vendor invoices. Should we test all million invoices for shell company attributes, or only those invoices that meet the shell company missing criteria test? The inclusion/exclusion theory would select only those transactions for the 100 vendors identified in the missing analysis.
Step seven: Establish Selection Criteria
In establishing the selection criteria, there are two fundamental strategies. The first is to identify all entities or transactions that meet the criteria. The purpose of the test is to exclude all data that do not meet the criteria. Since this test operates on one criterion, the sample population tends to be large, although much smaller than the total population. The auditor then can use either a random selection or auditor judgment on selecting the sample. The advantage is that the auditor has improved the odds of selecting a fraudulent transaction.
The second strategy is to select all data that meet the testing criteria, referred to as the fraud data profile. The selected strategy is a key criterion in selecting the sample. Your options are:
- Specific identification. The sample should be the transactions that meet the criteria.
- Control avoidance. The sample should be the transactions that circumvent the internal control.
- Data interpretation. The sample is based on the auditor’s judgment.
- Number anomaly. The sample is based on the number anomaly identified and auditor judgment.
So, what is the difference between the two strategies? The first strategy uses an exclusion theory to reduce the population whereas the second strategy uses an inclusion theory as a basis for sample selection. Remember, after identifying all transactions meeting the criteria, data filtering can be used to shrink the population.
Step eight: Start the Programming
It is interesting to see how different individuals program the software to create the data interrogation routines. Since programming is software dependent, I offer the following strategies to avoid faulty logic in the design of the search routine:
- Develop a flowchart of the decision process prior to writing the search routine. The order of the searching criteria will impact the sample selection process.
- Create record counts of excluded data and then reconcile the new control count to the calculated control count. It is easy to reverse the selection criteria, thereby excluding what should have been included. The reconciliation process helps avoids this error.
- Perform a visual review of the output. Ask yourself, does the result seem consistent with your expectations?
- Create reports that can function as a work paper. Remember, there must be enough information to locate the documents. However, reports with too many columns are difficult to read on the screen as well as.
A note of appreciation: I have had the opportunity to work with two of the best programmers, Jill Davies and Carol Ursell from Audimation. Their skill set was critical to the success of my fraud data analytics projects.
Sign up now to have this blog delivered to your inbox and read the rest of the series.
At Fraud Auditing Inc. we have over 38 years of diversified experience. Contact us today if you need help building a comprehensive fraud audit program to detect complex fraud schemes.