A Practical Approach to Analyzing Healthcare Data, Fourth EditionChapter 4, Analyzing Categorical Variables
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast rates and proportions commonly used in healthcare
Relate the rates and proportions to the appropriate statistical methods
Illustrate commonly used descriptive and inferential statistics
© 2019 AHIMA
ahima.org
Categorical Variables
Data elements that represent categories
Nominal (no natural order)
Ordinal (ordered)
Healthcare Examples
Gender
Discharge status
Dead/Alive
© 2019 AHIMA
ahima.org
Rates and Proportions
Commonly used in healthcare
Mortality rates
Infection rates
Complication rates
Readmission rates
Must understand the numerator and denominator of each rate
Numerator – count of subjects that meet the criteria to be measured
Denominator – count of subjects that could meet the criteria to be measured
Example – 30 day readmission rate for COPD
Numerator – patients discharged with a principal diagnosis of COPD and readmitted within 30 days
Denominator – patients discharged with a principal diagnosis of COPD
Be aware of any exclusion criteria
Adults only?
Gender specific rates
Immune-compromised patients
© 2019 AHIMA
ahima.org
Census Statistics
Inpatient census
Number of patients in the facility at a set point in time
Typically measured at midnight
Daily inpatient census
Number of patients in the facility at a set point in time plus any patients that were both admitted and discharged during that day
Resources are expended to treat patients that are admitted and discharged on the same day
May be a more relevant statistic that inpatient census for monitoring resource consumption
Inpatient service day
Inpatient service day for a particular day is equal to the daily inpatient census that that day
Average daily inpatient census
The number of inpatient service days averaged over a set time period.
Formula:
© 2019 AHIMA
ahima.org
Example 1
The hospital inpatient census at midnight on January 15th is 102. Fifteen patients are admitted, three patients are discharged and one patient is admitted and subsequently dies on January 16th.
What is the inpatient census for January 16th?
102 + 15 – 3 = 114
How many inpatient service days were provided on January 16th?
102 + 15 – 3 + 1 = 115
(Note that the one patient admitted and discharged on the same day is included in the inpatient service days, but not present for the January 16th inpatient census count.)
© 2019 AHIMA
ahima.org
Example 2
If the number of inpatient service days for the first calendar quarter of 2015 was 9,015, what was the average daily inpatient census for the quarter (round to the nearest 0.1 of a day)?
Step 1: Find number of days in period. First calendar quarter is January (31 days), February (28 days), March (31 days). Number of days = 31 + 28 + 31 = 90
Step 2: Divide the [inpatient service days] by the [number of days] in the period. 9015/90 = 100.2 days
© 2019 AHIMA
ahima.org
Utilization Rates
Cesarean section rate:
Number of c-sections performed divided by number of deliveries
Note the denominator is the number of deliveries (not mothers) and includes both c-section and vaginal births
Inpatient occupancy rate
Inpatient service days divided by the number of bed days in the period
The number of bed days is the number of beds available for each day in the period measured
If the number of beds changes during the period, then that change must be reflected in the number of bed days.
Mortality Rates
Gross mortality rate – number of patients that died divided by the number of patients discharged during the period
Net mortality rate: number of patients that died at least 48 hours after admission divided by the number of patients discharges during the period
The net rate excludes patients that died within 48 hours of admission from the numerator.
Autopsy Rates
Gross autopsy rate – number of autopsies performed divided by the number of patients that died while in the hospital during a period
Net autopsy rate – number of autopsies performed divided by the number of bodies available for autopsy
The net rate excludes bodies that might be taken to the coroner for investigations from the denominator.
© 2019 AHIMA
ahima.org
Example 3
If the number of inpatient service days for the first calendar quarter of 2015 was 9,015 and the facility had 120 beds available until closing 20 beds on March 1st, what was the inpatient bed occupancy rate for the quarter (round to the nearest 0.1)?
Step 1: Find the number of bed days.
Step 2: Divide inpatient service days by number of bed days:
9015/10180 = 0.886 or 88.6%
Month | Beds Available | Days | Bed Days |
January | 120 | 31 | 3,720 |
February | 120 | 28 | 3,360 |
March | 100 | 31 | 3,100 |
Total | 10,180 |
© 2019 AHIMA
ahima.org
Example 4
AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. What are the gross and net mortality rates for AMC Hospital (round to the nearest 0.1 of a percent)?
Gross mortality rate
= 12/256 = 0.047 = 4.7%
Net mortality rate
= (12 – 2)/256 = 0.039 = 3.9%
© 2019 AHIMA
ahima.org
Example 5
AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. Four bodies were autopsied. One body was transferred to the coroner’s office for a criminal investigation. What are the gross and net autopsy rates for AMC Hospital (round to the nearest 0.1 of a percent)?
Gross autopsy rate
= 4/12 = 0.333 = 33.3%
Net autopsy rate
= 4/(12-1) = 4/11 = 0.364 = 36.4%
© 2019 AHIMA
ahima.org
Census/Utilization Rates
© 2019 AHIMA
ahima.org
12
Population Health and Epidemiology Rates
Epidemiology – study of patterns in disease occurrence and spread
Incidence rate –
Number of new cases of a disease divided by the population at risk for acquiring the disease
Prevalence rate –
Number of cases of the disease (both new and existing) divided by the population at risk for acquiring the disease
Point prevalence – prevalence of a disease at a particular point in time
Period prevalence – prevalence of a disease during a time period (month, year, etc.)
© 2019 AHIMA
ahima.org
Example 6
The Department of Health in Center City is interesting in determining the effectiveness of the flu vaccine. They determined that there were 100 new flu cases during the month of January. The population of Center City was 15,000 during that month. What is the incidence rate of flu for Center City in January?
Incidence rate = 6.7 per 1,000
© 2019 AHIMA
ahima.org
Example 7
The officials in Center City wanted to further study the impact of the flu on the population. There were 54 residents with the flu on January 31st and 10 residents with the flu on January 1st. Use this data and the fact that there were 100 new cases during the month of January to determine the point prevalence for January 31st and the period prevalence for the month of January.
(Note that period prevalence includes anyone with the disease during the period. Since 10 residents were sick with he flu on Jan 1st, they are included in the numerator of the period prevalence for January).
© 2019 AHIMA
ahima.org
Descriptive Statistics: Proportions
Each subject either has or does not have the attribute to be counted (dead/alive, success/failure, yes/no)
Recode each observation as a binary variable (two values):
If attribute is present = 1
If attribute is not present = 0
The mean of the 0s and 1s is the proportion of subjects with the attribute
Simple example:
What proportion of patients are female?
Patient genders: M, M, F, F, M
Recode F = 1; M = 0
Recoded gender data: 0, 0, 1, 1, 0
Mean = 2/5 = 0.4 or 40% of patients are female
© 2019 AHIMA
ahima.org
Descriptive Statistics
Frequency distribution
Appropriate for both nominal and ordinal categorical data
Typically the counts and percentages for each category are presented
© 2019 AHIMA
ahima.org
Charts or Graphs
Since this subset of CPT codes is ordinal, the bar chart is a better representation.
Pie charts are a good choice for nominal data.
© 2019 AHIMA
ahima.org
Contingency Tables
Used to display and analyze the relationship between two categorical variables
Notice in table below:
20/32 = 62.5% of female patients were discharged home
10/24 = 41.7% of male patients were discharged home
Is this just a random occurrence or is this evidence that there is a significant relationship between gender and being discharged to home?
A hypothesis test may be used to answer that question
© 2019 AHIMA
ahima.org
Ranks and Percentiles
Ranks and percentile may be used to describe ordinal data
Ranks – the position of a value after the sample is ordered using order of magnitude – usually ascending (increasing) order
Percentile
AKA percentile rank
Points that divide the sample into 100 equal parts
Important percentile ranks:
25th percentile
AKA first quartile
25% of the values in the sample are less than the 25th percentile
50th percentile
AKA median or second quartile
50% of the values in the sample are less than the 50th percentile (50% are also greater)
75th percentile
AKA 3rd quartile
75% of the values in the sample are less than the 75th percentile
© 2019 AHIMA
ahima.org
Inferential StatisticsHypothesis Testing Basics
Hypothesis test – statistical technique used to determine if the evidence (values) present in a random sample is strong enough to make a conclusion about the population
Null hypothesis (Ho) – status quo, requires no action
Example: Ho for Table 4.3 is that there is no relationship between gender and discharge to home
Alternative hypothesis (H1 or Ha) – complement of the null hypothesis, often referred to as the research hypothesis
Example: H1 for Table 4.3 is that there is a relationship between gender and discharge to home
Data is gathered from a random sample of a population to determine if the null hypotheses can be rejected
© 2019 AHIMA
ahima.org
Hypothesis Testing Basics
Test statistic –
A statistic that is calculated to determine if the data values support
Must be compared to a known probability distribution to determine the making an error in deciding whether or not to reject the null hypothesis.
Type I error – incorrectly rejecting the null hypothesis when it is true
The alpha level or acceptable level of this error is set by the analyst prior to the start of the analysis of the data
The p-value is the smallest alpha level for which the null hypothesis would be rejected
If the p-value is smaller than the pre-set alpha level, then there is sufficient evidence to reject the null hypothesis
Type II error – incorrectly NOT rejecting the null hypothesis when it is false
this error may be controlled by the type of hypothesis test and the sample size used in the study
© 2019 AHIMA
ahima.org
Hypothesis Testing Steps
Determine the null and alternative hypotheses
Set the acceptable type I error or alpha level
Select the appropriate test statistic
Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic
Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.
© 2019 AHIMA
ahima.org
Inferential Statistics:Proportions
Used to determine if a population proportion is higher or lower than a standard
May be interested in a one or two sided hypothesis
Two sided alternative: important to know if population of interest is higher or lower than standard
One sided alternative: only concerned about higher or lower, not both
Two sided
One sided
© 2019 AHIMA
ahima.org
One-sample Z-test for proportions
Reject Ho
Reject Ho
Reject null hypothesis when Z is extreme
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions
Follow 5 basic steps in hypothesis testing:
1. Determine the null and alternative hypotheses:
Null hypothesis (Ho): p = 85 percent or 0.85
Alternative hypothesis (Ha): p ≠ 0.85
2. Set the acceptable type I error or alpha level
The company leaders are willing to accept a five percent error rate. Alpha = 0.05
3. Select the appropriate test statistic
Z is the appropriate test statistic
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions (continued)
4. Compare the test statistic to a critical value based on alpha and the distribution of the test statistic
5. Reject the null hypothesis if the calculated test statistic is more extreme than the critical value. If not, then do not reject the null hypothesis
Since Ha is a two sided alternative (≠) we select the critical value associated with the alpha level divided by two. We want to protect against both higher and lower alternatives. We reject Ho if Z > 1.960 or Z < −1.960. Z = −1.36 is not less than −1.960, therefore do not reject the null hypothesis
Conclusion: The observed 70% rate from the sample of 10 employees is not sufficient evidence to reject the null hypothesis
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions (continued)
© 2019 AHIMA
ahima.org
Confidence Interval for Proportions
Confidence interval: a range of values based on a sample that contains a population value with a set level of confidence.
Common in political surveys
President’s approval rating is 60% +/- 5%
AKA margin of error (+/-5%)
The width of a confidence interval is a function of the proportion value and the sample size
Widest confidence interval (large margin of error) when p = 50%
Larger sample side results in a narrower confidence interval for a proportion
1.96 for 95% confidence interval
© 2019 AHIMA
ahima.org
Example: Confidence Interval for Proportions
Conclusion: Based on the sample, we are 95% sure that the range 42% and 98% covers the true vaccination rate.
© 2019 AHIMA
ahima.org
Two-sample Z-test for Proportions
Used to determine if the proportion for a particular attribute is higher or lower when comparing two populations
Is the mortality rate as Hospital A higher or lower than that in Hospital B?
May be a one-sided or two-sided test depending the desire to determine which population is higher or lower
© 2019 AHIMA
ahima.org
Two-sample Z-test for Proportions
If Z > standard normal critical value at α/2, or Z < -(standard normal critical value at α/2), then reject Ho
Difference in sample proportions divided by standard error
Overall proportion when two samples are pooled together
© 2019 AHIMA
ahima.org
Example: Two-sample Z-test for Proportions
Is the mortality rate for MS-DRG 292 (p1) different from that in MS-DRG 293 (p2)?
1. Determine the hypotheses:
Ho: p1 = p2 ; Ha: p1 ≠ p2
2. Set the alpha level = 0.05
3. Select the appropriate test statistic: Z-test
© 2019 AHIMA
ahima.org
Example: Two-sample Z-test for Proportions
© 2019 AHIMA
ahima.org