SEER 9 Delay Model
For each cancer site, many combinations of covariates were considered in prediction models of delay probabilities. Potential covariates included delay time, year of diagnosis, age at diagnosis, sex, race, and reporting year effect (Zou et al., 2009). Models were evaluated by fitting the SEER 9 models using 1983 and 2007 annual submissions, with a maximum 26 year delay, then predicting the counts for the 2008 submission. For each cancer site, the model that minimized the sum of squared prediction errors was chosen as the default final model. However, to choose a more parsimonious model, we added an additional selection step in which possible competing models were selected using the following criteria:
- the competing model had fewer number of parameters of the default model, and
- the percent change between the prediction errors of the competing and the default models per extra parameter (i.e., percent change in prediction errors divided by the difference in the numbers of parameters between the two models) was less than 1 percent.
If more than one competing model met the criteria, the model with the smallest percentage change per extra parameter was generally selected. However, if there are other competing models that had fewer parameters and the differences between their percentage changes per extra parameter and the smallest one did not exceed 0.02, the competing model with the fewest number of parameters (rather than the model with the smallest percentage change per extra parameter) was selected. The chosen model was then refitted using all data (1983-2008 submissions, 1981-2006 diagnosis years) to estimate delay distributions and calculate delay adjusted estimates of the cancer counts.
Age-adjusted (using the 2000 US standard million population) cancer incidence rates were then calculated with and without adjusting for reporting delay. Joinpoint linear regression was used to obtain the annual percentage changes for the 1975-2006 incidence rates for the data series with and without delay adjustment. Because the delay distribution was assumed complete after 26 years, incidence rates for diagnosis years prior to 1982 were not reporting-adjusted. In joinpoint regression analyses, up to four change points (i.e, 5 trend-line segments) were allowed, and these were modeled to fall at either whole years or midway between diagnosis years. Change points were constrained to be at least 2 years away from both the beginning and the end of the data series and at least 2 years apart. Models were fitted using weighted least squares (weighted by appropriate variances of age-adjusted incidence rates) of the joinpoint regression software.
Results show that adjusting for delay tends to raise cancer incidence rates in more current reporting years. While this adjustment increases the rate of change over the most recent diagnosis years, it probably will only rarely cause the detection of a new joinpoint, although this is possible. See Clegg et al. (2002) for details on the impact of reporting-delay adjustment to SEER cancer incidence rates.