### Full-text (PDF)

Available from: Khaled Kasim

The crude and adjusted rates in epidemiology: Standardization and

Adjustment

Dr. Khaled Kasim, Ph. D. Epidemiology

Assistant Professor, Al-Azhar Faculty of Medicine, Cairo, Egypt

Abstract

Statistical adjustment in epidemiology is used to eliminate or reduce the confounding

effects of extraneous confounding factor, such as age, when comparing disease or

death rates in different populations. Direct age adjustment methods apply age-specific

rates from study populations to an age distribution from a reference population.

However, when age-specific rates are unavailable in the study populations, we use the

indirect method of adjustment. The indirect method uses the age- specific rates from

an external reference population to derive the expected number of cases or deaths in

the study populations. The expected count is used to calculate a standardized

mortality ratio (SMR), which is then used to adjust the rate in the study population.

This review presents also a strategy for confounding and interaction assessment,

which is essential to obtain the true measure of association between a disease and an

exposure. As a rule, interaction must be ruled out before methods of confounder

adjustment can be used. If adjustment alters interpretation of the crude risk, we must

use the estimated adjusted risk. On the other hand, if adjustment does not alter the

interpretation, we can use the crude rate.

Introduction

In epidemiology, most rates, such as incidence, prevalence, and mortality rate, are

strongly age-dependent and influenced greatly by the age structure differences. The

comparison of such crude rates over time and between different populations may be

very misleading if the underlying age composition differs in the populations being

compared. To compensate for this difference in age, we have two general options.

First, we can restrict comparison to similarly age subgroups (i.e., age specific

comparison). However, this can be confusing especially when many age strata exist

and comparison are made between many different populations. Therefore, combining

age-specific rates to derive a single age-independent index (single adjusted rate), may

be more appropriate and compensate for age differences in populations. The age

adjustment (standardization) can be achieved in several ways. In practice, the two

most common approaches in use are the direct and indirect weighting of strata-

specific rates (i.e., direct and indirect standardization).

This review presents a brief description of these two methods of standardizations,

with simplified hypothetical examples, and denotes their strengths and limitations. At

the end of this paper, we will also give some hints about the adjustment of measure of

associations with a simple clarifying example.

Page 1

Description of the Standardization methods:

1. Direct method:

In this method, we must:

i. know the age specific death rates in the two populations under comparison. If not

we must use the indirect method.

ii. Borrow a reference population from elsewhere. This reference population may be

hypothetical population or may be either one of the two populations under comparison

or we may use the total of the compared populations.

Example:

Suppose we have two populations A & B, the crude death rate for each of them is

more or less similar; 18/1000 for population A and 17/1000 for populations B (see

Table 1)

Table 1: Vital data of the two hypothetical populations A & B.

Population B

Population A

Age group

Categories

Age

specific

rate

No. of

deaths

No. of

population

Age

specific

rate

No. of

deaths

No. of

population

Total

From this table, we have the crude death rates of 18/1000 and 17/1000 for the

population A & B respectively. However, on looking carefully to this table, we

observe that the distribution (number) of population in each age group categories is

not the same in the two compared populations and this might possibly confound the

calculated crude death rates. To overcome this problem, we must adjust (standardize)

the death rate by age, which represents the confounding factor in this example.

Because the age specific death rates are known in this example for the two

populations, we will use the direct method of standardization. First, we borrow the

population A to be the reference population and then apply the number of population

in each of its age group categories in the population A to the age specific death rate of

the corresponding age group category in the population B.

The adjusted death rate in the population B is calculated using the following formula:

Σ (age specific death rate in B X No. of population in that age category in A

Total number of population in A

Σ = summation.

= 10/1000 X400 + 10/1000X600 + 45/1000X1000 / 2000 = 55/ 2000 = 27.5/1000

Page 2

From this calculation, we concluded that the age-adjusted death rate in population B

(27.5/1000) is higher than that of its calculated crude rate (17/1000). It is also higher

than that of the crude rate in population A (18/1000).

Note that, we can also use the population B as a reference population and applying the

number of population in each of its age group categories to the age specific death rate

in the population A (try to do it).

2. Indirect method

The indirect method of standardization is used when we have no data about the age

specific rate in one or the two populations being compared, but the number of

population in each age group should be known. In this condition, we borrow the age

specific death rates from a reference (standard) population and applying it to the

number of population in each corresponding age group of the compared populations

to obtain what is called the expected rates. Finally, we divide the observed rate by this

calculated expected rate and multiply by 100 to obtain the standardized morality ratio

(SMR). The adjusted rate using this indirect method is based on multiplying the crude

rate in the study population by this SMR ratio. The formulas summarising this method

are:

aR (indirect)= cR x SMR

SMR = O/E

E= ∑ R

i

n

i

Where

aR: adjusted rate

cR: crude rate

SMR: Standardized mortality ratio.

O: observed number of events in the study population.

E : expected number of events in the study population.

∑ : summation.

R

i

: the rate in ith stratum of the standard population.

N

i

: the number of population in the ith stratum of the study population.

Example: We use the previous table (Table 1), but without the number of deaths in

each age group categories and accordingly we will have no data about the age specific

death (see table 2)

Table 2: Vital data of the two hypothetical populations A & B, but without age-

specific rates.

Population B

Population A

Age group

Categories

Age

specific

rate

No. of

deaths

No. of

population

Age

specific

rate

No. of

deaths

No. of

population

Total

Page 3

In this example, we must borrow a third standard (reference) population with a known

age specific death rate to calculate the SMR (see table 3).

Table 3: Age adjusted death rate of a hypothetical reference population.

Age specific death rate

Age group

Using the data from this hypothetical table, we can calculate the SMR in the

population A & B.

The expected death rate in population A = 3/1000 X 400 + 18/1000 X 600 + 50/1000

X 1000 = 1.2 + 10.8 + 50 = 62

The SMR in population A = 36 / 62 X 100 = 58 %.

The expected death rate in population B = 3/1000 X 600 + 18/1000 X 1000 + 50/1000

X 400 = 1.8 + 18 + 20 = 39.8.

The SMR in population B = 34 /39.8 X 100 = 85 %.

Simply, since the SMR in population B (85%) is higher than that in population A

(58%), we can conclude that the risk of death is higher in population B. Similarly,

when we multply the crude rate of each population by its measured SMR, we have the

adjusted rate of population B to be about 15/1000 higher than that of population A

which is calculated to be about 9/1000.

Not that, we can also use the age specific death rate of one of the two compared

population (if it is known) and apply it to the other to calculate the expected death rate

and accordingly the SMR. Then we can read the SMR as follow:

- If the SMR is more than 100, this means that more events (deaths) are occurring in

the population than expected.

- If the SMR is less than 100, this means that fewer events (deaths) are occurring in

the population than expected.

When we apply the age specific death rate of population A for the population B, we

found the SMR for the population B to be about 170%. This means that more deaths

are occurring in population B than would be expected or the death rate in population

B showing 1.7 fold increase than population A. The adjusted rate in population B,

using the above mentioned formula, is estimated to be about 29/1000.

Although these methods of standardization are in common practical use since the

middle of 19th century and help to give summary statements of unbiased

comparisons, these methods have been found from the literature to have a number of

disadvantages that include the choice of standard (reference) population. The direct

standardization may suffer from instability of its estimate particularly when the

Page 4

component rates are based on small number of deaths. The use of indirect (SMR)

method, however, produces a greater numerical stability in such conditions.

Furthermore, the calculation of these measures (standardization) necessitates the

hypothesis of constant rate ratios. This is, however, not always satisfied in all the

conditions particularly in presence of missing data.

In conclusion, despite these mentioned drawbacks, the adjustment techniques are in

common use to eliminate the confounding effects of extraneous factor of interest

(such as age, sex, race, etc) when comparing epidemiologic or demographic rates over

time or in different populations. At first, the studied data should be stratified by the

extraneous factor to derive strata-specific rates. After stratification, adjustments are

done between the compared populations with respect to the extraneous factor of

interest, so that comparison can be made without confounding by this factor.

However, when we have more than one confounding factor control by stratification

appeared to be tedious. Also, the number of persons in each stratum becomes small

and the standardized rates become subject to random variation. Multiplicative

techniques have been developed to control simultaneously for several confounding

factors and these techniques provide a useful adjunct to the simple stratification

procedures of basic epidemiology.

Adjustment of measures of associations:

The measures of disease association in epidemiology include the prevalence ratio,

which is the ratio comparison of two prevalences, cumulative incidence ratio,

incidence rate ratio (relative risk), and disease odds ratio. Disease odds ratio provides

an alternative to the prevalence ratio and cumulative incidence ratio as a ratio of

association when the data represent proportions. The crude measures of association

between exposure (E) and disease (D) may be confounded by another extraneous

factor(s) called confounding factor(s) (F).

Confounding in epidemiology means a distortion of an association between E and D

brought about by an extraneous factor F (or extraneous factors F1, F2, F3, etc).

Together with selection and information bias, the confounding bias forms the three

main pillars of systematic error (bias) that may damage the results of any

epidemiologic research. Confounding bias to occur, the following preconditions

should be fulfilled in the confounding factor:

i. F and E are associated.

ii. F is an independent risk factor for D in the exposed and unexposed population.

iii. F does not involve in the causal chain (mechanism) of the disease D.

To clarify the confounding (confusion) bias, the following example is of value.

Suppose we found a positive association between lung cancer (D) and alcohol

consumption (E) in some of cancer epidemiology studies. This association might be

confounded by cigarette smoking. Cigarette smoking (F) and E are associated (alcohol

consumers are more likely to smoke than non-consumers are), and F and D are

associated (smoking is an independent risk factor for lung cancer). In addition,

smoking does not involve in the causal chain of lung cancer. Therefore, if we have no

data about the smoking status of the studied subjects, this observed association may

not represent the true association between lung cancer and alcohol consumption

Page 5

because of confusion bias. To judge if factor F confound the estimated measure of

association, we use the STRATIFICATION. By stratification, we mean to stratify the

studied subjects into groups by the confounding factor(s). Three scenarios may occur

when we use the stratification:

i. The estimated measure of association is the same among the stratified groups by F,

and the same as the crude measure. In this Scenario, neither confounding nor

interaction is suspected and we can directly present the estimated crude measure of

association (see table of Scenario A).

ii. The estimated measure of association is different among the stratified groups by F,

and from the crude measure. In this Scenario, an interaction between F and E is

suspected (effect modification), and an interaction test (A chi-square test for

interaction which is called test of heterogeneity) should be used to confirm such effect

(see table of Scenario B).

iii. The estimated measure of association is the same among the stratified groups by F,

but not the same as the crude measure. In this Scenario, F represents a confounding

factor and it must be controlled (adjusted) during statistical analysis to obtain the

adjusted measure of association (see Table of Scenario C).

It is pertinent here to say that before we use the methods of confounder adjustment,

the interaction between the two studied factors (E & F) must be ruled out. As a rule, if

the adjusted risk alters the interpretation of the crude measure of association, the

adjustment is mandatory. But if, on the other hand, the adjustment does not alter

interpretation, the crude measure of association can be used.

Table of Scenario A:

Non-smokers (F

Smokers (F

Total subjects

E

E

E

E

E

E

D

D

Total

RR (F-) = 4.0

RR (F+) = 4.0

Crude RR = 4.0

Relative risk

RR

RR* is the incidence rate among the exposed (E+) divided by the incidence rate

among the non exposed (E-)

Table of Scenario B:

Non-smokers (F

Smokers (F

Total subjects

E

E

E

E

E

E

D

D

Total

RR (F-) = 9.4

RR (F+) = 1.9

Crude RR = 4.0

Relative risk

(RR

Page 6

RR* is the incidence rate among the exposed (E+) divided by the incidence rate

among the non exposed (E-)

Table of Scenario C:

Non-smokers (F

Smokers (F

Total subjects

E

E

E

E

E

E

D

D

Total

RR (F-) = 1.0

RR (F+) = 1.0

Crude RR = 4.0

Relative risk

RR

RR* is the incidence rate among the exposed (E+) divided by the incidence rate

among the non exposed (E-)

In a Scenario C, the crude (unadjusted) measure of association, the relative risk (RR)

of lung cancer associated with alcohol consumption is 4.0. However, when we

stratified the studied subjects by smoking status (i.e. F + and F -), we found the risk to

be the same among smokers (RR= 1.0) and non-smokers (RR= 1.0) but different from

the crude risk. This means that smoking status is a confounding factor for the

association between lung cancer and alcohol consumption, and the estimated true risk

(4.0) is not the true association. Therefore, we must control (adjust) this confounding

effect and we should present the resulting adjusted risk as the true measure of

association. We can obtain the adjusted risk by using either the regression analysis

models (multivariate regression analysis), or we can use the weighted average

measures based on the intra-strata variance estimates (Mantel-Hansel method).

Explication of such methods of adjustments is beyond the scope of this review.

In a Scenario B, the crude RR of lung cancer associated with alcohol consumption is

4.0. By stratification, however, we found the risk among smokers to be 1.1 which is

different from non-smokers (RR= 9.1) as well as from the crude risk. The observed

heterogeneity of the estimated measures of association (crude and strata) suggests that

smoking is not a confounding factor in this scenario, but it modifies the effect of

exposure (alcohol consumption) on the disease (lung cancer). This biological

phenomenon is called effect modification and is related to a statistical phenomenon

called interaction. Interaction refers to a difference in effect of one factor according to

the level of another factor and it always implies direct biological and public health

relevance.

On epidemiologic basis, the effect modification is suspected when the observed joint

effect of the two studied factors is more or less than the predicted joint effect in the

additive model (RR11≠ RR10 + RR01 - 1) indicating the departure from this model

and consequently the presence of biological interaction. Confirmation of interaction

between study factors by a chi-square test of interaction and estimation of the risk that

represents this interaction is beyond the scope of this review. Generally, the use of

regression analysis is of great value in this respect.

Page 7

BIBLIOGRAPHY

Album A, Norwell S: Measures of comparison of disease occurrence. In Introduction

to modern epidemiology. 2

nd

edition. Epidemiology resources Inc,

1990; p30-35.

Arbitrage P, Berry G: Statistical Methods in Medical Research, 3rd Ed. Blackwell,

Oxford, 1994.

Below NE. and Day NE. Statistical methods in cancer research. Volume II. the design

and analysis of cohort studies IARC, Lyon, 1980.

Greenland S, Robins JM : Confounding and misclassification. American Journal of

Epidemiology 1985; 122 :495-506.

Mantel N, Haenszel, W. Statistical aspects in the analysis of data from retrospective

studies of diseases. Journal of National Cancer Institute 1959; 22:719-748.

Miettinen OS : Confounding and effect modication. American Journal of

Epidemiology 1974; 100 :350-353.

Gerstman BB : Stratification and adjustment. In Epidemiology kept simple : An

introduction to modern epidemiology. A John Willy & Sons, Inc., Publication. New

York. 1998; p108-120.

Rothman K and Greenland S: Precision and validity in epidemiologic studies. In

Epidemiology kept simple: An introduction to classic and modern epidemiology. A

John Wiley and Sons, Inc., Publication, 1998, P115-135.

Page 8