Warning: very long. tl;dr: Using the anomaly data has a better outcome in terms of the bias-variance tradeoff than using the data from each month separately.

Chris Reynolds suggested that we move discussion of PIOMAS prediction from the Latest PIOMAS Update thread

http://forum.arctic-sea-ice.net/index.php?topic=119 to here, so here I am. To keep everything in one place, I'm going to recapitulate some of the discussion and reproduce the (corrected) relevant graphs.

I started by posting a SARIMA forecast of PIOMAS ice volume based on the PIOMAS monthly data through April. In particular, the forecast predicts a monthly value for September of 4700 km^3, with 95% confidence interval of 3000-6300 km^3. I followed this with a spline regression of the monthly sea ice anomaly, which predicts a monthly value of 4600 km^3. I did not compute a confidence interval, but noted that splines have high variance at the ends of the fitted interval, so the confidence interval for forecast values should be assumed to be very large.

Chris asked whether it makes sense to work directly with the anomaly, arguing that it makes more sense to treat each month separately.

My response was that computing the anomaly is an attempt to treat the time series as consisting of a sum of a trend component and a periodic component, with additional noise. The anomaly computation attempts to subtract out the periodic component, leaving just the trend and the noise. If the seasonal component is approximately periodic, this should give a better estimate of the trend than estimating the trend for each month separately. To back this up, I produced a graph of just the September ice values, along with the ARIMA forecast and spline regression. The ARIMA forecast predicts a September ice volume of 4300 km^3, with 95% confidence interval of 1800-6700 km^3. The spline regression prediction is 3600 km^3.

The confidence interval for the SARIMA forecast using all of the data is smaller than the confidence interval for the ARIMA forecast using just the September data. My claim is that this shows that the SARIMA forecast is better than the ARIMA forecast.

Chris responded by challenging that claim. He pointed out that the September volume is falling faster than the April volume, and therefore the seasonal variability is not periodic. Predictions based on that assumption may be misleading, and therefore the computed confidence intervals for the SARIMA forecast may in fact be too small. (This is a summary of Chris's arguments, so hopefully I'm not misrepresenting him. If I'm missing anything important, I assume he will correct me.)

In response, I will point out that although I'm a new poster here, I've been lurking for several years. At about this time last year, the general consensus was that 2013 was likely to have a lower ice minimum than 2012. Around July 1, Chris Reynolds updated his prediction for the 2013 minimum. Contrary to his previous prediction of a record low, his prediction was now at the high end of predictions for the site, as shown on this post:

http://forum.arctic-sea-ice.net/index.php/topic,418.msg8940.html#msg8940. Chris's reasoning was that ice levels around the end of June were high enough, and the amount of ice melt from the end of June to the low in September is regular enough, that there was now essentially no chance of a record low in September. And in fact, this new prediction was highly accurate.

Using the anomaly data (or the SARIMA forecast) is essentially arguing from the same reasoning that Chris used last July. The ice data can be divided into a trend component, a periodic component, and a noise component. If we can estimate the trend and the periodic component, we can forecast future values for the ice as a sum of the trend and the periodic component, subject to the variability due to the noise. Chris's July prediction was an estimate of the periodic component, along with the argument that neither the trend nor the noise could be large enough to set a new record.

Chris's argument now is that the seasonal component is not periodic. Since the September ice level has been falling faster than the April ice level, the seasonal variability is increasing, rather than the constant variability assumed by the model. This is true. On the anomaly graph, it is clear that there is a seasonal component to the anomaly since 2010, so removing a fixed periodic component does not correctly account for the recent seasonal variability.

On the spline forecast graph, it is clear that this means that predictions based on a periodic seasonal anomaly underestimate the April maximum and overestimate the September minimum. (It's less obviously true, but the SARIMA forecast has a similar issue.) The question really becomes whether the error introduced by the imperfect seasonal model is outweighed by the additional data that the seasonal model gives us access to.

If the goal is to predict ahead a full year, then this question should be carefully considered. I suspect that the SARIMA forecast for April 2015 is better than an ARIMA forecast for April 2015 based on just the April data, but to verify this I would want to go back and check the historical prediction accuracy of both forecasts.

On the other hand, my primary goal currently is to predict the September 2014 minimum. If I just look at the ARIMA model (or the spline model) based on the September data, I have my prediction. It can't get any better, even as we get closer to September. On the other hand, the SARIMA model and the spline anomaly model can automatically incorporate all of the monthly data up until September, and in fact already have incorporated all of the data through April. As we get closer to September, the predictions will only get more accurate.

It's true that the spline anomaly model has a bias for September, but even though I haven't computed it, the variance for the spline anomaly model is much smaller than the variance for the spline model for September only. This variance will get smaller as we get closer to September, although the bias remains. This tradeoff between bias and variance is a fundamental tradeoff in statistics. You can minimize one or the other, but you can't minimize both. My claim is that by considering the seasonal anomaly rather than just the data for a single month, you can come closer to making the optimal tradeoff between bias and variance.