“Mr. Chan, rather than directing Sino-Forest’s spending on legitimate business operations, poured hundreds of millions of dollars into fictitious or over-valued lines of business where he engaged in undisclosed related-party transactions and funneled funds to entities that he secretly controlled,” ruled Justice Michael Penny of Ontario Superior Court in March 2018, in one of the largest corporate fraud cases in Canadian history. Allen Chan, co-founder and former CEO of Sino-Forest, was ordered to pay more than C$2.6 billion ($2 billion) in damages, while investors suffered a cumulative C$6 billion loss from the sharp decline in the company’s share price.
In an evolving game of cat and mouse, perpetrators of accounting fraud have grown ever more creative over the past few decades, while regulators and investigators have embraced ever more complex approaches for nabbing the miscreants. In particular, the shock of new accounting frauds after 2000 spawned regulations such as Statement on Auditing Standards (SAS) 99, SAS 113, the Sarbanes–Oxley Act and SAS 56, aimed at preventing accounting malpractice; auditors are now required to use these analytical procedures to verify reported financial statements. Nonetheless, cases like Sino-Forest continue to emerge. Ironically, the growing complexity of generally accepted accounting principles (GAAP) has created more room for manipulations to blossom — and has rendered analysis increasingly difficult. Traditional methods, such as percentage and financial ratio analyses (see Appendix), in which investors look for deviations from past norms, are labor intensive and difficult to deploy on a large scale.
This article will introduce four powerful, if less familiar, techniques that can help identify possible cases of accounting misstatements: the Beneish M-score,1 the Dechow F-score,2 Benford’s law3 and Zipf’s law.4 We will use two well-known cases of accounting fraud to show how these techniques work: drug company Valeant Pharmaceuticals International and fashion retailer SuperGroup. Valeant incorrectly booked $58 million of revenue in 2015; though this was barely a rounding error compared with the company’s then–$10 billion in annual sales, Valeant shares plunged 30 percent in three days, followed by several years of controversy and crisis. As for SuperGroup, its shares fell 38 percent on the day in 2012 that the company issued a profit warning blaming “arithmetic errors.”
We will also explore the frontier of machine learning as a potential solution to widespread accounting malpractice. Through the evolution of Big Data and machine learning, investors are no longer limited to analyzing financial data to detect accounting fraud. Machine learning can process large, diverse datasets of multiple dimensions. Techniques such as neural networks, support vector machines (SVMs) and ensemble methods have been successfully applied to detect fraud, but room remains for further innovations.
A Brief History
In the early 2000s, the financial world was rocked by some of the worst accounting frauds in history, at companies including Enron Corp. and WorldCom. These massive scandals, which resulted in corporate bankruptcies, cost shareholders billions of dollars. This rash of accounting scandals triggered a hue and cry to prevent, or at least limit, material misstatements in financial and asset reporting. In 2002, the board of the American Institute of Certified Public Accountants issued guidance titled “Consideration of Fraud in a Financial Statement Audit” (commonly known as SAS 99,5 see Figure 1), requiring auditors to be reasonably sure that financial statements are free from material misstatements, whether caused by error or by fraud. At about the same time, Congress passed Sarbanes–Oxley, which expanded requirements for all U.S. public company boards, managements and accounting firms in terms of disclosures, audits, the reporting of off-balance-sheet items, internal risk controls, criminal penalties for financial misstatements, whistleblower protection and higher standards of conduct.
Related to SAS 99 is SAS 56, which requires auditors to employ quantitative analytics to ensure that all significant line amounts and fluctuations (quarter-on-quarter changes) in financial statements have been satisfactorily explained. For example, if a sales trend diverges significantly from production capacity, those sales might be fictitious or at least suspicious. Similarly, a trend analysis of monthly sales and product returns might indicate channel stuffing, a deceptive practice in which, to inflate earnings, companies send retailers more products than they can sell.
In recent years, regulators have leveraged cutting-edge technologies to build cases involving accounting fraud, insider trading and market manipulation. With the help of forensic accountants, the Securities and Exchange Commission has designed an econometric model, the Accounting Quality Model, to identify suspicious trends by examining accounting choices such as estimating discretionary accruals from total accruals, analyzing them as indicators of incentives to take risk and then comparing them with the company’s industry peers to assess the probability of accounting manipulation.
Meanwhile, four other techniques have emerged to detect the possible presence of fraud.
Four Techniques for Finding Fraud
The Beneish M-score was developed in 1999 by M. Daniel Beneish of Indiana University’s Kelley School of Business. Beneish used Compustat data from 1982 to 1992 to develop the model, which in out-of-sample tests correctly identified 76 percent of frauds (that is, it missed 24 percent of them), while generating 17.5 percent of false alarms. The M-score model looks at eight key areas that can signal incentives or pressures to commit fraud: the number of days of sales in receivables index (DSRI); gross margin index (GMI); asset quality index (AQI); sales growth index (SGI); depreciation index (DEPI); selling, general and administrative expenses index (SGAI); leverage index (LVGI); and the ratio of total accruals to total assets (TATA). An M-score greater than –2.22 indicates a potential manipulator. Beneish recommends a cutoff point of –1.89 to balance the cost of Type II error (missing the fraud) over that of a Type I error (false alarm). A Type II error produces a much higher cost than a Type I because the impact of failing to discover a fraud could be detrimental, whereas the cost of a false alarm is lower, given that investors can allocate their capital to a large number of other stocks. (The M-score model has one limitation: Because financial institutions normally have leveraged capital structures, the M-score, which keys on leverage, cannot be used for them.)
According to Beneish, companies identified as manipulators typically lose about 40 percent of their market value on a risk-adjusted basis in a quarter. Assuming a typical equity gain of 1 to 2 percent per quarter, it would take the gains of 20 to 40 nonmanipulators in the same portfolio to offset this single loss. Therefore, the relative error cost of a Type II compared with a Type I is 20 to 40 times. Beneish derived his cutoff point of –1.89 from the relative error cost of 40 times.
Before the companies’ accounting problems, the M-scores of Valeant and SuperGroup were –1.976 and –1.586, respectively — both above the –2.22 threshold. When we compare metrics with the mean values of manipulators, we see that the likely sources of the misstatement from Valeant were sales growth and depreciation, while SuperGroup’s source was likely sales growth (see Figure 2).
The Dechow F-score is a more recent variation of the Beneish M-score and was developed in 2011 by Patricia Dechow and Richard Sloan of the University of California, Berkeley; Weili Ge of the University of Washington; and Chad Larson of Washington University in St. Louis. The model uses data from 1982 to 2005 and compares metrics from misstating companies with those from the same companies during periods when there was no misstatement, and with those of nonmanipulating companies. The mathematical model evaluates a company in five areas: accrual quality, financial performance, nonfinancial measures, off-balance-sheet activities and market-based measures.
The F-score analyzes seven variables that could suggest motivations for fraudulent activities: change in noncash net operating assets (rsst_acc), change in receivables (ch_rec), change in inventories (ch_inv), soft assets (soft_assets), change in sales (ch_cs), change in return on assets (ch_roa) and debt or equity issuance (issue).
This is a scaled logistic model, in which you can infer the probability from the score. A score greater than 1.0 indicates an above-normal risk (73 percent), a score greater than 1.85 indicates a high risk (86 percent), and a score greater than 2.45 indicates a very high risk of accounting manipulation (92 percent).
As Figure 3 shows, in the years before the companies’ accounting misstatements, Valeant’s F-score was 2.41 (92 percent, or high risk) and SuperGroup’s was 3.95 (98 percent, or very high risk).
Benford’s law takes a different approach from the financial data used in the M-score and F-score models: It is a power law that involves the frequency distribution of digits from 1 to 9 in numerical datasets. Mathematically, the law states that the first significant digit of the numbers in data coming from a wide range of sources follows the probability distribution. Hence, the frequency at which the numeral 1 is the first significant digit is the highest, and the frequency of 9 is the lowest.
This law was first observed by Simon Newcomb in 1881 and was rediscovered by Frank Benford in 1938. It appears to apply not only to numbers generated from mathematical expressions but also to a variety of social and natural data coming from demographics, accounting and geography, among other sources.6
For example, take a country whose per capita income is $100,000 and grows at a rate of 5 percent annually. It would take 15 years for that country to reach per capita income above $200,000, but hitting $300,000 with the same 5 percent growth rate would require only eight more years. The additional years required for the country to attain per capita income of $400,000, $500,000 and so on would keep decreasing until it reached $1 million; at that point, the country would again take 15 years to grow to $2 million. Reviewing the country’s data for 100 years, numbers starting with 1 would occur most frequently (34 percent), followed by those that began with 2 (16 percent), and so on to 9 (4 percent).
Benford’s law can be used to detect fraud in accounting statements because manipulated numbers tend to deviate significantly from the anticipated frequencies. Figure 4 provides a review of the numbers that Valeant and SuperGroup reported in their financial statements during the years they were involved in accounting fraud. (Additionally, a 95 percent confidence interval was calculated for each first digit. A confidence interval gives us a band in which the numbers fall within a reasonable distance from the theoretical value. If the theoretical value falls within the confidence interval, we can be 95 percent confident that it’s not statistically different from the observed value.)
During 2013 and 2014, we can detect some violations of Benford’s law in Valeant’s numbers — in particular in the former year, when the frequencies of 4 and 7 as first digits were abnormally high. For SuperGroup, we observed more deviations from the expected frequencies: In 2010, digits 3, 4 and 9 deviated significantly from Benford’s law, while in 2011 the differences were larger for digits 6, 7 and 9.
Zipf’s law is similar to Benford’s law, but it looks at natural languages rather than first-digit numbers. Harvard University linguist and philologist George Kingsley Zipf observed that “strings” (that is, collections of characters) in texts appear in a frequency that follows a specific functional form. According to Zipf’s law, the frequency of a word in a collection of text in any natural language is inversely proportional to the frequency rank of that word within the collection. Hence, if the most commonly used word (ranked first) has a frequency of f1, then the word ranked second will have a frequency of f1/2, and so on.
As Adeola Odueke and George Weir point out, an advantage of Zipf’s law over Benford’s law is that it can be applied to a wider range of datasets because it is not limited to numerical attributes.7 Thus, fraudulent behavior can be assessed by using sets of alternative data instead of only the most conventional ones related to financial statements. This flexibility makes Zipf’s law more appealing for future research on forensic accounting as various types of data become more accessible.
A New Horizon: Big Data and Machine Learning
Detecting accounting fraud is a complex and evolving task, but new developments in Big Data and machine learning can be exploited to increase the accuracy of the algorithms. The Big Data revolution has added variety to the information that can be fed to fraud-detection algorithms in addition to numerical and financial statement data; it includes texts, social media content, conference reports, interviews and other types of unstructured data. Provided that companies and regulators can efficiently collect, store and process this kind of data, fraud-detection capability can be boosted significantly.
With an increasing amount of data, model training will naturally improve as more fraud cases are included in the datasets. The relatively small number of data on fraudulent companies has been an issue in forensic accounting, given its negative effect on Type I and II error rates, and although some practitioners have found ways around it (Dechow in her F-score, for example), the best solution is to let the models learn from new information on accounting misbehavior.
Big Data is not the only new tool for assessing accounting fraud; machine learning has also made a number of recent advances. The detection of fraud can be defined as a classification problem, and leveraging machine learning methods can improve accuracy and reduce Type II errors, which tend to be costly for investors and regulators.
The Beneish M-score and Dechow F-score models can be seen as approximations to using machine learning. Their approach is a classical supervised classification problem — developing a model by working out a relationship between input variables and the output — using probit and logit models on relevant financial ratios. Probit models the probability of manipulation as a normal distribution, while logit handles it as a logistic distribution: the logarithm of the odds. Some of the most recent machine learning applications have followed this line of thought but made use of other machine learning algorithms for training the parameters, such as neural networks, decision trees, ensemble methods, support vector machines, fuzzy logic and other statistical models.
Let’s take a closer look at these techniques.
Neural networks are an attempt to mimic the structure of a human brain by using a collection of artificial neurons and the connections among them. These networks are trained by applying a backpropagation algorithm, but there is no underlying theory for building these models for specific tasks, let alone for accounting fraud.
However, the flexibility of neural networks in modeling complex and nonlinear relationships in data has been of great use in detecting accounting fraud. Efstathios Kirkos and his colleagues, for example, found that a neural network with six input nodes, one hidden layer and five hidden nodes successfully classified 80 percent of the validation data, with a Type II error of 17 percent.8
New advances in the study of neural networks, such as deep learning, have yet to be used in fraud detection. Deep learning refers to more-complex structures of neural networks, usually with a larger number of neuron layers. Training these networks demands a great amount of computing power, but as capacity improves, more-complex neural networks can be trained and their ability to detect fraud will be enhanced.
Unlike neural networks, decision trees offer a logical and straightforward process for classifying data, but they do so at the expense of accuracy. Furthermore, in most applications, decision trees tend to underperform other classifying methods because they are less resistant to small changes in the training sample, indicating overfitting issues, or modeling errors that produce deceptive patterns from a limited amount of data. Kirkos and his colleagues note that the accuracy of a decision tree is lower (73.6 percent) and its Type II error rate higher (25 percent) than those of a neural network.
Ensemble methods were developed as a solution to the overfitting issue. They include AdaBoost and random forests, which consist of a weighted aggregation of a set of weak classifiers, such as decision trees. Bin Li and his co-authors applied classifiers for accounting-fraud detection based on ensemble methods and obtained an accuracy increase of 18 percent and a Type II error reduction of 7 percent with respect to classifiers based on logistic regression.9
Support vector machines make up another family of classifiers. They rely on more-theoretical foundations than decision trees or neural networks and preserve their accuracy off-sample.
The robustness of SVM classifiers allows them to be modified and adapted to suit different kinds of problems. In the accounting fraud literature, for example, there have been some interesting applications, like Li and his co-authors, who use nonlinear kernels along with SVM to achieve an accuracy that is 4 percent better than the benchmark logistic model.
Fuzzy logic takes a different approach from classical logic, in which statements are either true or false. Fuzzy logic algorithms allow data points to have different degrees of belonging to a certain class. Using fuzzy logic results in imprecise inference rules that are particularly helpful when there is a lack of information — a main feature of accounting-fraud applications.
Studies like the one performed by Mary Jane Lenard and her colleagues leveraged fuzzy logic methods to use publicly available financial and nonfinancial data to detect fraud, with an accuracy rate of 76.7 percent.10
Statistical models are used to solve simple classification problems and rely on the assumption of a type of probability distribution. Examples of these models include hidden Markov, linear discriminant analysis, logistic regression, naive Bayes and Bayesian belief networks.
Although logistic regression has been widely used in accounting-fraud research (for example, in the Beneish M-score and Dechow F-score), there have been other applications, such as those explored by Kirkos and his colleagues, in which the probability of a company engaging in fraudulent accounting behavior is modeled as a Bayesian belief network. In this method, the dependencies of the different attributes of the data are represented in a directed acyclic graph — a set of nodes that are connected in such a way that there is no sequence of connections that eventually loops back to the starting node. In Kirkos, this methodology exhibits the best results, outperforming decision trees and neural networks.
Machine Learning in Detecting Misappropriation of Assets
The exponential increase in online transactions has made payment processing a fertile ground for perpetrating fraud. Credit card transactions and automated clearing house (ACH) transactions have been very prone to misappropriation of assets, often carried out via phishing or malware attacks. ACH fraud can be abetted by inadequate internal blocks, a lack of filters and multiple authorizations.
Machine learning models are already used in credit card authorization to identify potentially fraudulent transactions in real time. This is typically done by scoring a transaction based on the trustworthiness of the vendor and the cardholder’s purchasing behavior, as well as time and location data. The number of false alarms, while substantial in the initial phase, can be slowly narrowed down as more data comes online and more-robust profiles of cardholders are created.
The Beneish M-score and Dechow F-score models investigate the likely sources of aggressive accounting choices, while Benford’s law and Zipf’s law look into the natural distributions of numbers and words (see Figure 5). By using these approaches together, an investor can build safeguards against potential manipulators.
However, these models can still miss clues to an unfolding fraud that may appear in the footnotes of financial statements, and they cannot account for subjective qualities like the credibility of management. This is where machine learning can really shine, because it can analyze enormous datasets of all kinds of information, not only financial data. Perhaps with further advances in machine learning, the day may soon come when the cat no longer has to chase the mouse.
Michael Kozlov is Senior Executive Research Director at WorldQuant, LLC, and has a Ph.D. in theoretical particle physics from Tel Aviv University.
Jorge Hurtado-Guarin is a Research Intern at WorldQuant, LLC, and a Master of Finance candidate at MIT Sloan School of Management.
Parin Trakulhoon is a Research Intern at WorldQuant, LLC, and an MBA Finance Track candidate at MIT Sloan School of Management.
1. Messod D. Beneish. “The Detection of Earnings Manipulation.” Financial Analysts Journal 55, no. 5 (1999): 24-36.
2. Patricia M. Dechow, Weili Ge, Chad R. Larson and Richard G. Sloan. “Predicting Material Accounting Misstatements.” Contemporary Accounting Research 28, no. 1 (2011): 17-82.
4. George Kingsley Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Boston: Addison-Wesley Publishing, 1949.
5. American Institute of Certified Public Accountants. “Consideration of Fraud in a Financial Statement Audit,” 2002.
6. Tamás Lolbert. “Digital Analysis: Theory and Applications in Auditing.” Hungarian Statistic Review 84 (2007).
7. Adeola Odueke and George Weir. “Triage in Forensic Accounting Using Zipf's Law.” Issues in Cybercrime, Security and Digital Forensics (2012): 33-43.
8. Efstathios Kirkos, Charalambos Spathis and Yanis Manolopoulos. “Data Mining Techniques for the Detection of Fraudulent Financial Statements.” Expert Systems with Applications 32, no. 4 (2007): 995-1003.
9. Bin Li, Julia Yu, Jie Zhang and Bin Ke. “Detecting Accounting Frauds in Publicly Traded U.S. Firms: A Machine Learning Approach.” Asian Conference on Machine Learning (2016): 173-188.
10. Mary Jane Lenard, Ann L. Watkins and Pervaiz Alam. “Effective Use of Integrated Decision Making: An Advanced Technology Model for Evaluating Fraud in Service-Based Computer and Technology Firms.” Journal of Emerging Technologies in Accounting 4, no. 1 (2007): 123-137.
11. Rezaee, Zabihollah. Financial Statement Fraud: Prevention and Detection. John Wiley & Sons, 2002.
Thought Leadership articles are prepared by and are the property of WorldQuant, LLC, and are being made available for informational and educational purposes only. This article is not intended to relate to any specific investment strategy or product, nor does this article constitute investment advice or convey an offer to sell, or the solicitation of an offer to buy, any securities or other financial products. In addition, the information contained in any article is not intended to provide, and should not be relied upon for, investment, accounting, legal or tax advice. WorldQuant makes no warranties or representations, express or implied, regarding the accuracy or adequacy of any information, and you accept all risks in relying on such information. The views expressed herein are solely those of WorldQuant as of the date of this article and are subject to change without notice. No assurances can be given that any aims, assumptions, expectations and/or goals described in this article will be realized or that the activities described in the article did or will continue at all or in the same manner as they were conducted during the period covered by this article. WorldQuant does not undertake to advise you of any changes in the views expressed herein. WorldQuant and its affiliates are involved in a wide range of securities trading and investment activities, and may have a significant financial interest in one or more securities or financial products discussed in the articles.