Chat with us, powered by LiveChat Instructions 1. An individual written paper on your - STUDENT SOLUTION USA

Instructions

1. An individual written paper on your paper or video/webinar: (this should be about 2 pages

of written text, it can be more depending on your paper and whether or not you do a

demonstration of the technique)

Section #1: will briefly summarize and describe the paper?s objective.

Section #2: will briefly describe the SAS procedures/techniques (or SAS code) used, provide a

description of any examples/applications and perhaps illustrate your own demo of the

procedures. If you feel the method is beyond your current skill level you can just describe what

they did. Some use of the SAS documentation may be required if the syntax is not well

explained in the paper.

Section #3: will discuss the questions you were asked and your answers to those questions.

Section #4: will briefly describe why the paper is of interest to you and provide commentary on

the paper. The commentary should include some ways the paper could be improved, and some

sentences on whether you think this paper and/or technique will be useful to you or a general

SAS user.

The grade will be done based on:

-Was the paper well written and grammatically correct?

-Were all sections present?

-Were written explanations clear?

-Was a demonstration done if appropriate?

Section 3:

? Can you brief me about the industry profiling analysis in this paper?

Page number 18 in the paper

? What approach did you follow to remove negative values while forecasting the number of cases
per hour?

Page number 11 in the paper

UNCLASSIFIED

1

Paper 1047-2021

SAS? Time Series Analysis & Forecasting

(TSAF) at the Canada Revenue Agency
(CRA), with COVID impacts

Jason A. Oliver, Senior Compliance Analyst, Canada Revenue Agency (CRA)

ABSTRACT

It may well be a recurring theme of this year’s SAS Global Forum that we are faced with
more pressure to use flexible thinking – not just critical thinking – and when it comes to

time series analysis and forecasting (TSAF) in SAS, it’s all about “rethinking the curve”.

At the Canada Revenue Agency (CRA) Compliance Programs Branch (CPB), we have
grappled with reliable forecasting for macro-level tax variables on a month-to-month
basis, even before the COVID-19 pandemic hit. But now we face a particularly difficult

challenge. As with many large organizations, it is not easy to foretell what the fallout
may be from such a cataclysm.

In setting up SAS to right the trajectory, we must be extra cautious about some of the
fallacies in applying TSAF in this context: the lagged effect for tax revenues realized
based on audits of the previous tax year, the need to differentiate average tax recovery

per case from sum of tax recovery (month-to-month), realizing that industry sectors
are not “one size fits all”, and accounting for relatively temporary effects of staffing re-

orientation in the conversion to a virtual workplace versus the more enduring effects of
business disruptions. With SAS Enterprise Miner’s abilities to continuously adjust
forecasts, sub-categorize datapoints by tax office or industry sector, and apply lagged

cross-correlation analysis, we are suitably equipped with the right tools and this can
provide abstract learnings for other large organizations.

INTRODUCTION

The Canada Revenue Agency (CRA) is Canada?s federal tax administration. As with all tax

jurisdictions, the CRA has been challenged to keep pace with COVID-19 shocks and

manifestations, which began in March 2020 (the last month of our fiscal year).

Fortunately, SAS? Enterprise Miner? has been an invaluable aid in gauging these impacts.

Enterprise Miner? includes a highly versatile set of functional nodes for configuring and

processing time series data. It can decompose time series components such as seasonality

and trend, show trend lines and expected forecast within configurable prediction intervals,

and demonstrate complex correlation analyses.

While this has been of great benefit to the CRA in gauging the trajectory of macro-variables

related to tax revenues and auditor performance, the findings of this research paper could

UNCLASSIFIED

2

conceivably be applied in the abstract to large organizations with process-oriented

functions, and not just to other foreign tax jurisdictions.

Let us provide a Glossary of terms to set the stage:

? TSAF: Time Series Analysis & Forecasting.

? TEBA: tax earned by audit, which is the amount of tax collectible that is agreed
upon in the course of a taxpayer audit. It is in NPV (Net Present Value).

? TAR: the tax-at-risk, which is the amount that CRA risk assessors arrive at as
the precursor to auditing activity.

? C/AR ratio: the ratio of [audit] cases completed, to action requests [submitted]

for assistance. It is a tentative measure of auditor productivity.

? Integras: the tool used by CRA auditors to process cases.

TIME SERIES FUNCTIONAL NODES & SETUP

In SAS? Enterprise Miner?, you have six TSAF nodes in the ?Time Series? ribbon; but we?re

only going to use four of them. Below is the Time Series ribbon with the functional nodes in

question:

Figure 1. Time Series Functional Nodes

? TS Data Preparation: this node allows you to specify basic time series properties

including interval, cycle, start/end time, and accumulation (i.e. by total, min or max,

mean, etc.)

o Below, the interval is ?automatic?, so we specify ?Month? as the interval.

o We can leave the seasonal cycle and start/end time as ?Default?, as SAS?

Enterprise Miner? will auto-determine these parts from the data.

o In our case, the data was pre-accumulated in SAS? Enterprise Guide? row-

by-row on a per-month basis, so we can leave Accumulation = ?Total? (else,

we would have to set it ?Average?).

Figure 2. TS Data Preparation node ? basic properties

UNCLASSIFIED

3

? TS Decomposition: this node allows you to specify similar basic settings to that of

the TS Data Prep node, but the Number of Periods can be configured, and moreover,

you can configure which Export Components you want to display.

o By default, it will only display ?Trend-Cycle? component (=Yes), which is

generally regarded as the most salient one.

o However, in our case, we want to view ALL Components, so we would set that

value to ?Yes?.

Figure 3. TS Decomposition node ?properties

TS Correlation: this node allows you to set up your TSA for autocorrelation analysis, or

alternatively for CCA (Cross-correlation analysis). When you select one of those methods,

the other one?s properties will be greyed out.

Figure 4. TS Correlation node ?properties

Both the TS Correlation and TS Decomposition nodes must be preceded by a TS Data

Preparation node (which occurs right after the source data node).

UNCLASSIFIED

4

TS Exponential Smoothing: this node allows you to conduct forecasting based on your

known data; as such, you would connect it to a TS Data Preparation node, not directly to

your source data node.

? The interval is automatic (which will be month in the case of our pre-accumulated

data), and the accumulation defaults to ?Total? (which is OK in our case, for the

same reason).

? SAS will pick what it deems to be the best forecasting method.

? The default selection criterion is MSE, or Mean Squared Error.

? We will see more on the Forecast lead, back, and significance level parameters

during the forecast demonstration in this paper.

Figure 5. TS Exponential Smoothing node ?properties

For our initial workspace setup, we can scrutinize on the C/AR (Case to Action Request)

ratio, which as per our glossary is a tentative measure of tax auditor performance. The

initial diagram workspace is called ?Aggreg_Integras_27mths?, which runs from January

2018 to March 2020. This is arranged this way for a reason: because it ends on the month

of the COVID shutdown.

Our dataset name is ?TSA_AGGREG_SINGLE_LINE_27MTHS?.

So, when I bring this in, I need to set all variables to Role = ?Rejected? except a) C/AR ratio

and b) my MONTH (Time ID) variable.

Figure 6. Variable Role selection from data source

UNCLASSIFIED

5

You would set your variables once you bring the data source to your diagram (workspace).

Figure 7. TS Data Source to Diagram flow

NOTE: I do not cover the mechanics behind bringing in a data source, as the principal focus

is on conducting TSAF in SAS? Enterprise Miner?. All we need to be concerned with is that

as Data Sources become available in the top-left menu, we can drag-and-drop them to our

diagram workspace (which are also created by right-clicking ?Diagrams? in the left panel).

In examining the TS Data Preparation node, it is fairly simple: we see the known trajectory
of the C/AR variable, simply by right-clicking the node ? Run ? Results.

Figure 8. Time Series Plot, for C/AR ratio variable

We can see that the C/AR ratio has fallen off as of mid-2018, and continued on a very

gradual downward path. Which means that case auditors are completing disproportionately

less cases to the action requests they submit for help, albeit with a seasonal factor and

some rebounding of the trend-line in March 2020.

So, we can scrutinize on the more specific components of the time series line by using a TS

Decomposition node.

UNCLASSIFIED

6

DECOMPOSITION OF TIME SERIES

In running our TS Decomposition node, and viewing the results, the first one to examine

is the Seasonal Component Plot. When it comes to the C/AR ratio, the seasonal index range

is between a high of about 1.3 down to about 0.75.

Figure 9. Seasonal Component Plot, for C/AR ratio variable

During the months of March and December, we see fairly high seasonality. This is normal

for the time, since the push to complete cases is higher at the end of the CRA fiscal year

(March), and ostensibly at the end of the calendar year, also. Auditors are completing

proportionally more cases vs. the number of action requests they submit to the service

desk. So it is likely that they are fulfilling cases that do not require as many interventions

during those months. Even in March 2020, C/AR still remained high ? it was

resilient to the initial COVID effects, due to being a ratio variable and not an absolute

sum variable.

In the decomposed results, we can also examine combinatory components; for instance, the

Trend-Cycle Component Plot:

Figure 10. Trend-Cycle Component Plot, for C/AR ratio variable

UNCLASSIFIED

7

This tells us what we had surmised from the initial data preparation, that the series has

been on a steadily downwards trajectory. Now when it comes to tax-related time series

data, there is no real cycle per se; at best, it is an inherited cycle from world economy

fluctuations. The proper definition of cycle in a TSA context is not the entity?s operational

lifecycle; rather, it refers to the boom-and-bust business cycles which are largely

unpredictable. Ergo, we are mainly concerned about trend here.

Now, if we substitute the Average TEBA (tax earned by audit) variable for C/AR [using the

Data Source node shown in figure 6 earlier], we can see what emerges in our decomposed

time series results.

Figure 11. Paneled Component Plots, TS Decomp. for Avg. TEBA

This time, as per the panel graph at bottom-left, we see that our seasonality index is

broader than that of C/AR ratio; it goes from a high of about 1.8 to a low of ~0.7. This is

largely attributable to the heightened pressures towards fiscal year-end to increase

realization of TEBA, which we see in Feb.-March. At the opposite end, we see rather low

seasonality for May, August, and November.

For the original series plot, bottom-right, the trend continues gradually upwards with

seasonality readily apparent. In the trend-cycle component plot, at top-left, we see that the

trend (with cycle, such as it is) is rising steadily upwards but then reaches a virtual plateau.

The key challenge then, has been to resolve and reconcile the expected forecast as of March

2020 with the new COVID-19 realities.

FORECASTING MACRO TAX VARIABLES

AVERAGE TEBA

We can proceed to evaluate the expected trajectory of the AVG. TEBA variable, on a

monthly interval. Recall that this variable is pre-accumulated at data source.

When we conduct our forecast, we use the TS Exponential Smoothing node.

UNCLASSIFIED

8

Figure 12. TS Exponential Smoothing node in the TSAF diagram

We let SAS? pick the best forecasting method, as well as selection criterion (forecast

measure). In this case, the latter value is the MSE [Mean Squared Error] as you can see at

the bottom of the properties of the node.

Figure 13. Properties of the TS Exponential Smoothing node

For our Significance Level, we set this to 0.5; it governs the blue bracket around the

forecast line, a.k.a. the prediction interval. So it is a confidence band of sorts. The way this

figure works is the opposite of what some of us might know from frequentist confidence

intervals; that is, the lower the ?alpha? value, the wider the band (prediction interval) so an

?alpha? of 0.01 would produce a very wide band, and an ?alpha? value = 0.99 would be

virtually limited to just the forecast line itself. So we aim in the middle (which actually is

closer to the outline of the trend line, as this figure is more ?log-like? in its manifestation).

Figure 14. TEBA_NPV_Mean: forecast line from trend

SAS logically expects the trend will continue upwards (while maintaining seasonality, of

course) due to ?series momentum?. Had we began our time series at, say, January 2016

rather than Jan. 2018, that momentum might have been more pronounced. The clich?s of

UNCLASSIFIED

9

?future behavior is governed by past behavior? and ?you can?t know where you?re going,

unless you know where you?ve been? have never been truer. However, enter COVID-19,

and that is a whole new wrench in the gears of the tax-auditing apparatus.

As for the selection of ?Best? Forecasting Method: you could try to experiment with

different models ? there are eight in all, as per fundamental TSAF science ? but I can tell

from the shape of the forecast line that it?s based, appropriately, on the Additive Winters

method1. I ascertained this by running the node with this method selected, and the

resulting graph was identical to ?best? method. Unlike the Multiplicative Winters method,

this forecast line is predicated on fairly consistent seasonal ?inverted V? shapes in the curve.

If those inverted V shapes became noticeable larger (or smaller), then Multiplicative Winters

would likely be the ?best? method that SAS would auto-select.

Figure 15. Available Forecasting Methods, properties of TS Exp. Smoothing node

We see that in the resulting forecast, it predicts ahead exactly 12 months. This is the

difference between the figures of ?Forecast Lead? and ?Forecast Back? in the properties. We

saw on the previous page that the ?Forecast Back? = 6; this acts as our validation partition,

using the last six months of known data (i.e. Oct. 2019 to March 2020). So this gets

subtracted from the ?Forecast Back? value of 18 to arrive at 12 periods out. Ideally, you

want your ?back? [validation] period to be between 20-25% of your known data, which it is

out of 27 months; even when we increase the known months to 30, it will still be 20% of

this.

SUM OF TEBA

When we run a TSAF experiment on the SUM of TEBA ? as opposed to its average ? we

realize a drastic difference in the scale. Because TEBA is a sum value, not a ratio (i.e.

C/AR, or [Average] TEBA/case), it is simply not as resilient to sudden shocks like COVID-19

? as we will later see when adjusting the forecast based on incremental months (April, May,

June) of known values.

1 The essence of the Winters method is to combine discernible trend with seasonality.

UNCLASSIFIED

10

Figure 16. TEBA SUM Forecast (post-March 2020)

Note that the MSE selection criterion (default) graphs a trend line around the known values

(which are represented by the red dots here). The SUM TEBA for Feb. 2020 is nearly double

what it was for March 2020, as you can see by the relatively large separation of the red dots

from the blue dots (on trendline) for those two months. Yet SAS? ?thinks? that the trend

will continue positively, as it is ?COVID-agnostic?.

What may also seem shocking to the reader is that the lower limit of the prediction interval

for April 2020 (at ~$674.5M) actually exceeds the actual value for April 2019, which was

slightly below $500 million. It is not until the fall until we see that the midpoint of actual

2019 data approximates the LCL (lower confidence limit) of the forecasted band for Sept.

2020. This is ostensibly due to the ?positive momentum? of the time series that I alluded to

earlier.

C/AR RATIO

Next, we switch out the SUM of TEBA for the C/AR ratio, once again. In forecasting a

relatively low continuous ratio variable such as C/AR, the prediction interval can be less

reliable. We have to examine the midpoint distribution. While the midpoint post-March

2020 tends to be at or above the 10.0 line, this is rare for 2019 datapoints.

Figure 17. C/AR ratio Forecast

UNCLASSIFIED

11

I used the Mean Relative Abs. Error as the forecast metric (selection criterion), which I

found to be more appropriate. Regardless, what we see in the actuals for the spring of 2020

is a very low C/AR ratio, telling us that case throughput has suffered as a result of the

pandemic AND that Action Requests for help did not decline proportionally; there was still

an apparent high need for action requests.

FORECASTING AVG. HOURS PER CASE

For forecasting average hours per [audit] case, I determined that the more ideal Selection

Criterion was ?Median Relative Abs. Error?. No matter what Selection Criterion I used (or

Significance Level), the prediction interval still dipped into the negative range. Sometimes,

this is unavoidable. But then the prediction interval becomes spurious; you can?t have

negative hours. So we tend to just focus on the midpoint values in this situation.

Figure 18. Average hours per case Forecast

We can see that the midpoint goes very subtly upwards for the first few forecasted points

(post-March 2020), then sharply up for summer. As it turns out, this is a fairly good

approximation of the reality, since the Avg. Hours per case during the middle of 2020 is

about 1.5-2.0 times that of the previous year. What is especially pronounced is that the

Average Hours of March 2019 were only 6.25, whereas for March 2020, it was 35.44. This

was predicated on an Agency policy-induced change; refer to the link and passage below:

https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-
despite-continued-backlog-?email_access=on

In March 2020, the CRA announced that it was suspending the vast majority of audit activity for a

minimum of four weeks, other than audits involving the very largest taxpayers. This suspension meant

that the CRA ceased requests for information relating to existing audits, finalizing existing audits, and

issuing reassessments. Further, deadlines for information or document requests were suspended and no

action was required from taxpayers under audit during this time. This suspension remained in effect until

June 2020, though audits of small and medium businesses did not resume until late fall.

This is also arguably responsible for the ?pulse? effect we see in actual Avg. TEBA for July

2020, as per the monthly incremental analysis that comes next.

UNCLASSIFIED

12

INCREMENTAL ALIGNMENT

APRIL 2020, KNOWN VALUES

Now when we add the month of April 2020 to our data (making it 28 mths total), we would

expect the AVG. TEBA actuals for subsequent months to become closer to / within forecast

range. As an example in the graph cross-section that follows, the forecast for September,

October, and December 2020 becomes more within range of later-known actuals, once we

add April 2020 data. However, the July 2020 actual (~$122,000) is still above the forecast

band for this incremental dataset?s forecast. This was likely due to the resumption of

standard large business audit as of June 2020 (see previous page article/passage).

Figure 19. Revised AVG. TEBA forecast, incremental inclusion of APRIL 2020

Again, we typically use the measure of MSE [Mean Squared Error] in gauging efficacy or

proximity of a forecast to actual [values]. See the Appendix tables at the end of this paper

for a breakdown of this analysis, where I illustrate monthly incremental effect on accuracy

of the last six months of the calendar year (i.e. from July to Dec. 2020).

MAY 2020, KNOWN VALUES

Clearly, the addition of April wasn?t enough to right the trajectory of the expanding ?COVID

window?. So in continuing our analysis of monthly incremental effect, I added May 2020?s

known data and I changed the forecast significance level from 0.5 to 0.25. But it makes no

difference: July actual is still out of forecast range. We must simply accept that July 2020

Avg. TEBA is an irregular value (~$122K), since July 2018 had Avg. TEBA =~$45K, and July

2019?s Avg. TEBA was ~$57K. It is clear that this is a COVID-adjustment spike.

Figure 20. Revised AVG. TEBA forecast, incremental inclusion of MAY 2020

UNCLASSIFIED

13

We can therefore define July 2020 as a pulse, or a one-time brief event, that caused a

spike in the accumulated time series value for that month. This emphasis on larger

business for audit while suspending SMB audits at the time is further substantiated by the

fact that in July 2020, there was an average of 50.75 hrs per case completed, which is

extremely high. For April, which had a very high Average TEBA of $185.5K, the figure was

52.16 average hours per case.

JUNE 2020, KNOWN VALUES

Predictably, for the addition of June 2020, it didn?t improve the forecast band to include the

actual Avg. TEBA for July. So this strengthens the theory that July?s value was a one-time

event, or pulse, in the time series. It also strengthens the theory that Avg. TEBA was more

resilient to initial COVID-19 transition measures (being a ratio value, in essence). To wit:

observe below that the April-May-June line for the original forecast (left) and actual data

points (right) is just above the $50K line, and follows the same trajectory.

Figure 21. Comparing Q1 of FY2020-21 forecast vs. actual data points

In taking MSE and RMSE (R is ?root?) measurements for both the as-of-March and as-of-

June forecasts, we only note a slight improvement (reduction) in that value. Which also

goes to show the resilience of this variable, and the ?pulse? nature of July?s spike.

MEASURE / as of MONTH MARCH 2020 JUNE 2020

AVG. TEBA (MSE) $ 954,467,257.64 $ 888,454,004.34

RMSE $ 30,894.45 $ 29,806.95

Table 1. Point-in-time [R]MSE for AVG. TEBA forecast-to-actual: July to Dec. 2020

Refer to the Appendix at the end of this paper for a more detailed month-by-month

breakdown of these calculations.

FALLACY: COMPARING SUM OF TEBA SHIFT TO AVG. TEBA CHANGES

TSAF works best when you accumulate data records by average, not by sum total. If we

tried this exercise using SUM TEBA per month, it would not turn out very well, because sum

totals are immediately impacted by any severe transition, i.e. auditor work re-arrangements

and temporary audit case policy due to COVID-19 fallout as of March 2020.

Evaluating the March 2019-2020 comparison in the following table, the TEBA_SUM and

Case Count have dropped significantly in March 2020, yet the C/AR ratio has augmented.

UNCLASSIFIED

14

Table 2. Year-over-Year March comparison, key macro-variables in TSA

However, as the staffing situation has attempted to stabilize in the intervening months

(April to June 2020), the C/AR ratio has dropped dramatically. (Not shown in above table.)

The same is true for the TEBA/AR pattern.

SUM OF TEBA: DRASTIC CHANGE

We now compare the SUM TEBA forecast as of March 2020 (left image) and that of June

2020 known data points (right image).

Figure 22. Comparison of SUM of TEBA forecast as of March vs. as of June (2020)

For the first image, none of the actuals of the last six months of 2020 fall in the forecast

band. Whereas, for the second image, two of the actuals of the last six months (Oct., Nov.)

fall in the forecast band.

Also observe how some of the accumulated data points in the forecast are more ?depressed?

in the latter graph; while there is a discernible peak, it doesn?t quite have the same

buoyancy or upwards momentum as the former graph. (We must keep in mind, though,

that this is still using the MSE method, i.e. taking a line of best fit, where the red dots are

the actual values.)

So, there is little point in using the MSE to gauge efficacy of the monthly adjustment, simply

because the values would be so huge (as opposed to those in the Avg. TEBA MSE).

UNCLASSIFIED

15

ADVERSE IMPACTS AND DELAYED EFFECTS

LATENT EFFECTS OF SHOCKS

We would also expect that lower Avg. TEBA wouldn?t manifest until much later in the fiscal

year 2020-21, due to most of 2020 consisting of past year audits. The graph below covers

known Avg. TEBA trend data points right up to December 2020, the lowest point.

Figure 23. Calendar-year-end (2020) Avg. TEBA; lowest point

This extremely low Average TEBA of ~$32,000 per case could be a harbinger of further

average TEBA decline, but we?d have to observe the last quarter of the fiscal year ? January

to March 2020, once available ? and validate that theory. (Then we might apply an

intervention to the time series line.)

Incidentally, when it comes to SUM of TEBA with actuals up to Dec. 2020, the forecast trend

line for 2021 is far more credible, showing all datapoints as being well under $1 billion, and

mostly under $500 million.

INTERVENTIONS

As alluded to before, a TSAF exercise may use interventions, if the extreme or irregular

event is known in advance (or shortly thereafter). This is an adjustment to the ?regular?

time series, using a ?dummy? variable for the period of observation. In this case study,

we?d recommend an intervention for the SUM of TEBA as of March 2020, and possibly for

AVG TEBA as of Dec. 2020. Plus, we might use a ?pulse effect? for July 2020. However,

programming an intervention requires SAS? Studio?, which is out of scope for this paper.

Figure 24. Basic denotation of input variables (interventions) by type

Lowest actual in 3
years; Dec. 2020
Avg. TEBA of $32,404

A step would work best as an intervention
(for March 2020 and Dec. 2020), since the
trend line shift is sudden and sustained; it
does not happen gradually then return to
baseline.

UNCLASSIFIED

16

TS CORRELATION NODE

AUTOCORRELATION

When we deal with a significant seasonal and/or trend component, we usually find a greater

degree of autocorrelation factor (abbreviated ?ACF?). As the name suggests, this is the

tendency of a variable to self-influence. It could also be regarded as momentum, or

?muscle memory?.

In a similar vein, when frontline auditing teams are performing well, some of that

momentum carries over from one period to the next, as they build ?muscle memory? and

are better-equipped to deal with more trying scenarios that have [abstract] aspects in

common with recent cases worked on. This presents opportunities for ?boilerplate? copying

and pasting of common findings from one case to another, adjusting for specifics, and

accelerating average time to complete as well as garnering more average TEBA per case.

Clearly, during the current COVID-19 climate at this writing, and the embargo of SMB case

audit during the spring 2020 period, we can expect some of that momentum to be adversely

impacted ? since auditors were working on more complex large business cases overall. But

first, let us examine a baseline from the years 2018-2019, below:

Figure 25. ACF Plot, three key tax-related macro-variables (2018-2019)

From the three variables

error: Content is protected !!