The California Franchise Tax Board (FTB) implemented a broad tax modernization initiative. FTB’s mission is to help individual and business income taxpayers file tax returns timely, accurately, and pay the correct amount to fund services important to Californians. One of the objectives of this initiative was to better identify fraudulent tax preparers… and that is where they turned to Aviana for help.

The Project

The Enterprise Data to Revenue (EDR) project included analytics as one of the primary methods to generate revenue for this benefit funded project. Aviana provided artificial intelligence (AI) solutions using predictive and outlier-based analytic models. Over 50 AI models served the major functional areas of the EDR organization including:

  • Filing Enforcement Scoring & Ranking
  • Debt Collection Risk Management
  • Audit Selection
  • Fraud Detection

The Approach

Aviana provided full life-cycle of concept definition, requirements analysis, detail design, model development, model validation, integration and deployment, and post-deployment maintenance. All AI model development and deployment was accomplished using the IBM SPSS Modeler and Collaboration and Deployment Services (C&DS) platform, on which Aviana is a recognized expert. The modeling data mart was over two terabytes in size and included data from two other California state agencies and the IRS. Implementing models at enterprise scale required Aviana to implement a robust methodology, formal configuration management of modeling artifacts, and a release management procedure.

Another important factor in tackling a project of this scale was Aviana’s methods for involving the business stakeholders through the lifecycle of model development. This led to the creation of a team of “Citizen Data Scientists.”

Our ability to translate statistical findings and model results into a form easily consumed by business stakeholders was key to getting buy-in and sign-off on the models so they could move on to production deployment on schedule.

Deployment Challenges

These projects met challenging deployment requirements. During peak periods, the debt collection models had to score up to 100,000 new cases per day using an automated integration with FTB’s mainframe computer system. Aviana’ deployment using the IBM SPSS platform accomplished this using file and message-based triggers. The system has never missed its service-level agreement due to performance issues.

Another challenging deployment requirement was to provide real-time scoring as incoming refund tax returns are processed. Aviana’ deployment using the IBM SPSS platform accomplished this using a web-service scoring engine supported by pre-computed historical taxpayer statistics maintained by periodic batch models. The system, too, has never missed its service-level agreement due to performance issues handling as many as 30,000 returns per hour during peak filing season.


Today, the implemented models score:

  • 3M+ new debts per year (up to 100K/day during peak season)
  • 17M+ taxpayers for audit selection
  • 10M+ returns per year for refund fraud and identify theft (up to 30K per hour during peak)
  • 1M+ delinquent non-filers
  • 38K tax preparers for fraudulent filing

The FTB implemented a rigorous method to track benefits attributed to the various initiatives within the EDR project. This was necessary since it is through those benefits that the vendors got paid. As of December 2016, the monetary benefits attributed to Aviana models included:

  • Improved Audit Selection: $14M
  • Improved Filing Enforcement: $385M
  • Improved Refund Fraud Detection: $136M

The analyses and resultant models from this project were developed using the following AI techniques: Data Mining, Micro and Macro Trending, Machine Learning, Geospatial Modeling, Multivariate Cluster/Outlier Analyses, Principal Component Analysis, Linear regression, Logistic Regression, Model segmentation, Feature Selection, Classification and Regression Trees, and CHAID.