Author Topic: Time series models of NSIDC Extent and PIOMAS Volume (Read 16680 times)

OSweetMrMath · « **on:** August 15, 2015, 09:54:03 AM »

Abstract

Independent time series models are proposed for NSIDC Arctic Sea Ice Extent and PIOMAS Arctic Sea Ice Volume. Model validation/performance is analyzed. Future sea ice levels are predicted on the basis of these models.

Introduction

Satellite measurements of Arctic sea ice extent and volume, beginning in 1979, have provided several measures of the condition of the sea ice. These measurements are of interest both as a historical description of the changes in the sea ice and as a tool for predicting future behavior of the sea ice.

Models of varying complexity and prediction accuracy are currently in use. We propose a time series model in the ARMA (auto regressive moving average) family of models for each of Arctic sea ice extent, as reported by the NSIDC, and Arctic sea ice volume, as reported by PIOMAS. The models use the monthly time series data for the extent and volume as their only inputs, and the two models are independent. That is, the extent data is used only in the extent model, and the volume data is used only in the volume model.

The remainder of this post has several sections. In the next section, the models are described. This is followed by a section validating the performance of the models. Finally, several short and longer term predictions about the behavior of the sea ice are presented on the basis of these models.

Model description

The ARMA class of models is a well studied class of time series models with generally understood applications for time series prediction. At the most general level, a time series can be represented as a function of time, so X(t) is the value of the time series X at the specified time t. The time series is represented as a deterministic function of the past plus some random noise component (called "innovations"). If the random noise series is represented as the time series Z(t), then this leads to a model with representation

X(now) = f( X(history), Z(history) ) + Z(now)

where f is some function of the history of the time series X and the noises Z.

The ARMA class of models requires that f be a linear function of the history, so the model simplifies to

X(t) = a1 * X(t - 1) + a2 * X(t - 2) + a3 * X(t - 3) + ... + Z(t) + b1 * Z(t - 1) + b2 * Z(t - 2) + ...

a1, a2, and so on, and b1, b2, and so on, are assumed to be constants. In this notation, if t is any particular time, t - 1 is one unit of time before t (in the past), t - 2 is two units of time before t, and so on. There are effectively two contributions to X(t). First, there is the contribution from previous values of the time series, a1 * X(t - 1) + a2 * X(t - 2) + a3 * X(t - 3) + ... . This is known as the auto regressive (AR) part of the time series, because the values of the time series are predicted by ("regress on") the time series itself. Second, there is the contribution from the noise, Z(t) + b1 * Z(t - 1) + b2 * Z(t - 2) + ... . This is known as the moving average (MA) part of the time series, because the noise at any time can be considered as a weighted average of the recent noises.

In general, the goal in ARMA time series modeling is to find a small set of constants for a1, b1, a2, b2, and so on, and a description of the noise Z(t) which characterizes the behavior of the time series. In general, increasing the number of constants will improve the fit of the model to the existing data, but possibly at the expense of reducing the accuracy of predictions of future data. A good model is one with a minimal number of constants and for which the noises Z(t) can be described as independent and identically distributed. (In practice, the noises as described by the model are approximately uncorrelated and constant variance, which is easier to achieve.)

NSIDC Extent Model

The monthly extent time series from the NSIDC has two traits which mean that it cannot be directly modeled as a ARMA time series. First, it has a trend. Second, it has a seasonal (or periodic) component. Fortunately, both of these can be handled by defining a difference series. Rather than modeling the extent time series itself ( X(t) ), we can model the difference series Y(t), defined by

Y(t) = X(t) - X(t - 12)

Because we are starting from a monthly time series, t - 12 is one year in the past from time t, and so Y(t) is the time series of year over year changes in Arctic sea ice extent. Y(t) can be modeled as an ARMA time series, and we should expect the model to include both month to month changes and year to year changes.

The monthly time series data through April 2014 was downloaded from the NSIDC website and imported into the R programming environment. The original time series was used to compute the difference series, and then the difference series was fit to an ARMA model using the built in time series libraries in R. The resulting model for Y, the difference series, is

Y(t) = -0.0540 + 0.6589 * (Y(t - 1) + 0.0540) - 0.7851 * Z(t - 12) + Z(t)

(all units for extent in millions of sq km). This has a fairly direct interpretation. Y(t) is the year over year change in extent in month t. The average change is -0.0540 (on average, the extent is 54,000 sq km lower in any month than it was one year before). If the year over year change last month was different than 54,000 sq km, 66% of that difference is retained as part of the year over year change this month. (If last month had a lower extent than usual, this month will also have a lower extent than usual.) Z(t - 12) is the noise from one year ago. If the noise was negative, since this month's extent is computed as a difference from the extent one year ago, this month's extent is predicted to be lower. The term - 0.7851 * Z(t - 12) counteracts that to some degree, making the year over year difference smaller in response to a large noise one year ago. Finally, the noise for this month is added.

This time series model is convenient from a standpoint of analysis and prediction. To predict average future values of the time series, assume that the noise takes its average value in the future. The model is specifically constructed so the average value of the noise is zero, so given the difference series Y up to time t and the estimated innovation series Z up to time t (these estimated values are called residuals), future values of Y can be predicted by assuming all future values of Z are equal to 0 and plugging in the known and predicted values for Y and Z into the model equation above. To predict the extent itself, plug in the known values for the extent and the predicted values for the differences into the equation X(t) = X(t - 12) + Y(t) to predict the extent at time t.

This model can easily incorporate new data into the predictions. At the current time t, the model predicts future values of Y(t + 1), Y(t + 2), etc., and X(t + 1), X(t + 2), etc. When the data for X(t + 1) becomes available, Y(t + 1) and Z(t + 1) can be directly computed, and the corrected values for these can be used to update the predictions for Y(t + 2), Y(t + 3), X(t + 2), X(t + 3), and so on. Note that new data does not change the model. The model equation for Y(t) above does not change in response to new data. The change is that new data changes the values for Z(t) at time when they were previously assumed to be 0 to their observed values. This change in data changes the resulting predictions from the model and data, without changing the model.

With somewhat more effort, the model can be used to generate probability distributions for the predictions. Analysis of the residuals for Z(t) indicate that they are heavy tailed, which means that assuming a normal distribution for the noise is likely to result in prediction intervals which are smaller than they should be. A bootstrap procedure can be used, using the observed residuals as an empirical distribution of the noise, and using them to generate sample values for the noise. The distribution of a large number of samples will be approximately equal to the true distribution of the noise.

Chart 1 shows the predicted NSIDC extent for May 2014 - April 2015 in solid red, with the limits of the 95% prediction interval for the extent in red dashed lines. Both the predictions and the prediction intervals are based on the NSIDC data from April 2014. The black line shows the true extent for April 2014 - April 2015.

[Chart 1]

Chart 1 shows the predicted values for May 2014 - April 2015 based on data up to April 2014. As stated above, the predictions can be updated with new data as it becomes available, which will tend to improve the accuracy of predictions for a fixed point in time as the available data gets closer to that point. Chart 2 demonstrates this by plotting the prediction errors. Each colored line represents the prediction error for predicting the extent for May 2014 - April 2015 based on the model and subsequent data. The color of the line indicates the month of the last available data for that prediction, in rainbow order, so red indicates the prediction from April 2014 and purple indicates the prediction from March 2015. The graph shows prediction errors, so a value of zero indicates that the prediction exactly matched the observed data.

[Chart 2]

PIOMAS Volume Model

The same basic discussion as for the extent model applies to the volume model. The model for the difference series is somewhat more complicated, so the equation for the model based on the data for April 2014 becomes

Y(t) = -0.3216 + 1.196 * (Y(t - 1) + 0.3216) - 0.2641 * (Y(t - 2) + 0.3216) + 0.3396 * Z(t - 1) - 0.6805 * Z(t - 12) - 0.3396 * 0.6805 * Z(t - 13) + Z(t)

(volume is in units of thousands of cubic km) Although the equation is more complicated, it has the same basic meaning. The average year over year change in volume is -320 cubic km. If recent months have a larger year over year change than average, the next month is also predicted to have a larger year over year change. The relationship is more complicated in this case, so both the change from two months previous, Y(t - 2), and the noise from last month, Z(t - 1), directly enter the formula. When considering the influence from the previous year, both the noise from one year ago, Z(t - 12), and the noise from one year plus one month, Z(t - 13), now appear.

As in the case for extent, generating predictions of future volumes is straight forward, and updating these predictions following new data can be made automatic. Again, estimating the prediction distribution requires a bootstrap estimate, due to the fact that the noise is heavy tailed.

Chart 3 shows the predicted volume for May 2014 - April 2015, based on the model and data from April 2014 as a solid red line. The 95% prediction interval is shown as dashed red lines. The observed volume is shown as the solid black line.

[Chart 3]

Chart 4 shows the effect of incorporating new data into the predictions. In particular, the prediction for July 2014 was too low by 1.000 thousand cu km, so all predictions made before July had errors of 1.5 thousand cu km or more for all subsequent months. On the other hand, predictions made for August and September after the correction for July had errors of less than 0.005 thousand cu km, small enough that the errors are not visible on this chart.

[Chart 4]

OSweetMrMath · « **Reply #1 on:** August 15, 2015, 09:55:08 AM »

Model Validation

A model is useful only in so far as that it is valid. Having proposed models for Arctic sea ice extent and volume, we must address the question of whether the models accurately describe the existing data and predict future data. ARMA time series models place strong demands on the noise model. In order for the model to be valid, the noise, as estimated by the model residuals, must be uncorrelated and have constant variance. A stronger version of this is to require that the noise be independent and identically distributed.

Testing for correlation is typically an automatic part of the model building process. Typically several possible models are tested until one is found which does not show statistically significant correlations in the residuals. In this case, the model for the NSIDC extent does show a statistically significant correlation at an interval of 11 months.

There is a possible physical interpretation for this. The monthly data is averaged over each month. This means that often changes in the ice can be observed in the daily data or in other related data sets which do not appear in the monthly data. However, the year over year changes reflect this change in the ice the following year. A possible example is that in July 2014, by the end of the month it was clear that daily extent loss was slow and so the measured extent was high. This did not appear in the July monthly extent data, but did appear in August, when the August monthly extent was higher than predicted. At the same time, this change appeared in the July 2014 volume data, which was also higher than predicted.

Looking forward one year, the July 2015 monthly extent was higher than predicted. It is possible to argue that the high July 2015 extent reflected the low July 2014 extent loss, which was also shown in the high August 2014 extent. Therefore the July 2015 year-over-year change is correlated with the August 2014 year-over-year change, at an interval of 11 months, and the model should be modified to incorporate this.

Until now, we have considered this possible correlation to be too small, and the overhead in model complexity too large, to justify incorporating this correlation into the model. We will continue to observe this correlation and will modify the model if it becomes necessary to accurately track this correlation.

In contrast, the PIOMAS volume model does not show meaningful significant correlations in the residuals.

The second property that the model requires is that the residuals have constant variance. We prefer the stronger assumption that the residuals are identically distributed, which is a necessary assumption to justify the bootstrap prediction distributions. There are more direct ways of testing for identical distribution, but because we are using these models specifically to predict future ice levels, we propose directly testing the validity of these predictions.

Under the model assumptions, the predictions generated by the model are the true mean of possible future paths taken by the data. The noise is assumed to be independent and identically distributed, with the distribution given by the empirical distribution of the residuals. Under these assumptions, we can create a bootstrap sample of possible futures for the model. For each possible future (or sample path), we can measure the distance from the sample path to the mean path, and then compute a distribution of the distances of the sample paths from the mean path.

We can then measure the distance of the observed actual future path to the predicted mean path. If the model assumptions reasonably describe the true ice behavior, then the observed path should not have a statistically unusual distance from the mean path.

Distance is measured by the sum of the squared differences between the predicted values at each time point and the mean values at each time point.

For the NSIDC extent, a mean prediction and a collection of sample paths were generated for the time period May 2014 - April 2015, and the bootstrap distribution of the distance of the sample paths to the mean prediction was computed. The distance of the observed data for this time period to the mean prediction was computed and compared to the bootstrap distribution. 35.09% of sample paths had a greater distance than the observed distance, so the model appears to reasonably describe the extent over the past year. (This is effectively a two-sided p-value of .70.)

For the PIOMAS volume, a similar computation results in the conclusion that 1.29% of sample paths have a greater distance over the prediction period from the mean than the observed data. (This is a two-sided p-value of .026.) However, this should not necessarily come as a surprise. The prediction error in July 2014 was 1.00055 thousand cubic km, which is the largest positive error in the entire data set. The question then becomes if this prediction error is a failure of the model or a signal of truly atypical behavior of the actual data.

It is tempting to write this off as bad luck. The data is an extreme value among the possible sample paths, but this was a consequence of the fact that the July data was also an extreme value. However, it should also be treated as evidence of typical behavior for the model. In particular, the assumption that the noise has constant variance (or standard deviation) should be reconsidered.

Examination of the residuals on a decade by decade basis indicates that the standard deviation of the residuals (which should equal the standard deviation of the noise) has been growing over time. For 1980 through 1989, the standard deviation was 0.210. For 1990 through 1999, this increased to 0.256. For 2000 through 2009, this was 0.258. Without doing a formal test, the standard deviation may have changed from the first decade of data to the later data, but the 20 year period from 1990 to 2009 appears to be constant.

However, the standard deviation for 2010 through April 2014 (the most recent data used to build the model) is 0.391. This is clearly not in agreement with the earlier time periods. The conclusion is that the ARMA family of models may not provide a good fit in modeling the PIOMAS data. Another possibility, which may behave better, would be to consider models in the ARCH family, which can directly model changes in the variance of the noise.

One final consideration in evaluating the validity of the models is to observe changes in the model based on refitting the model to new data. Rather than considering the new data from May 2014 to April 2015 as new data applied to the existing model, we can rebuild the model from the ground up using all of the data from 1979 through April 2015. If the model represents the data well, we should expect only small changes in the model parameters and in the forward predictions of the model.

The NSIDC model based on the data through April 2014, as stated above, is

Y(t) = -0.0540 + 0.6589 * ( Y(t - 1) + 0.0540 ) - 0.7851 * Z(t - 12) + Z(t)

The revised model based on the data through April 2015 is

Y(t) = -0.0526 + 0.6625 * ( Y(t - 1) + 0.0526 ) - 0.7912 * Z(t - 12) + Z(t)

The parameter values changed by less than 3%. The data for May 2014 to April 2015 was used to generate predictions for May 2015 to April 2016 from the April 2014 model. This was compared to the predictions for May 2015 to April 2016 from the April 2015 model, and the predictions agree to within 0.05 million sq km.

The results for the NSIDC models are more impressive in light of the fact that there was a substantial revision of the NSIDC data in March 2015. The changes were typically on the order of 0.01 million sq km and always less than 0.10 million sq km, but the fact that these changes did not have a disruptive effect on the model supports the robustness of the model.

The PIOMAS model from April 2014 is

Y(t) = -0.3216 + 1.196 * (Y(t - 1) + 0.3216) - 0.2641 * (Y(t - 2) + 0.3216) + 0.3396 * Z(t - 1) - 0.6805 * Z(t - 12) - 0.3396 * 0.6805 * Z(t - 13) + Z(t)

The PIOMAS model from April 2015 is

Y(t) = -0.2889 + 1.206 * (Y(t - 1) + 0.2889) - 0.2684 * (Y(t - 2) + 0.2889) + 0.3312 * Z(t - 1) - 0.6867 * Z(t - 12) - 0.3312 * 0.6867 * Z(t - 13) + Z(t)

The changes are small, with the exception of the mean value, which has changed from a loss of 320 cu km year over year to a loss of 290 cu km year over year. The predicted values for both models for the time period of May 2015 to April 2016 were compared and the largest difference in predicted values is 120 cu km. This occurs at the April maximum. On a percentage basis the largest differences are around 1%, near the September minimum.

Predictions

The models for NSIDC extent and PIOMAS volume were updated to reflect the data as of April 2015, and then were used to generate predictions for extent and volume for the following 12 months. The predictions are listed below.

Month	May	June	July	Aug	Sept	Oct	Nov	Dec	Jan	Feb	March	April
Extent	12.63	10.95	8.13	5.68	4.77	7.58	9.94	12.17	13.58	14.38	14.63	14.05
Volume	22.99	18.38	11.76	7.48	6.22	7.34	10.48	14.02	17.48	20.32	22.34	23.37

The reported data for NSIDC extent for May, June, and July is 12.65, 10.97, and 8.77, respectively. The predictions for May and June were so close to the observed data that correcting for the observations has essentially no effect on the remaining predictions. July, on the other hand, had a substantial missed prediction. This was used to correct the predictions for August and September, resulting in updated values of 6.10 and 5.05. The 95% prediction interval for the September extent is 4.39 to 5.67 million sq km.

Note that extent loss this year has been somewhat atypical. Extent loss in June was low, followed by high extent loss in July. Because of the way that monthly extent is computed by the NSIDC, this may have resulted in an artificially high July extent, contributing to the model's prediction error. In that case, we should expect the August and September values to be below the current predictions. Even so, the observed values are expected to fall within the 95% prediction intervals.

The reported data for PIOMAS volume in May, June, and July is 23.02, 18.54, and 11.63. The fact that July was lower than predicted has the effect of pushing down the updated predictions, so the updated values for August and September are 7.18 and 5.89. The 95% prediction interval for the September volume is 4.83 to 6.83 thousand cubic kilometers.

These models are intended to be short range models. The differencing gives the models the character of a random walk, and the resulting variance of the prediction intervals grows rapidly over time periods greater than one year. The extent model has been shown to be reasonably accurate over short time periods, but no claims have been made about longer periods. The volume model has not been demonstrated to control the prediction error, even over periods of less than a year.

Throwing caution to the wind, we will use the models to predict the September extent and volume through 2019. The predictions include the mean prediction, the 95% prediction interval, the probability of setting a new record minimum, and the probability of falling below 1 million sq km (extent) or 1 thousand cu km (volume).

Extent

Year	Mean	95% PI	Record	Less than 1000000
2015	5.05	4.38 - 5.68	Less than 0.01%	Less than 0.01%
2016	4.79	4.05 - 5.50	0.2%	Less than 0.01%
2017	4.74	4.00 - 5.46	0.3%	Less than 0.01%
2018	4.69	3.92 - 5.43	0.5%	Less than 0.01%
2019	4.64	3.84 - 5.40	0.7%	Less than 0.01%

Volume

Year	Mean	95% PI	Record	Less than 1000
2015	5.87	4.61 - 7.01	0.2%	Less than 0.01%
2016	5.24	2.80 - 7.49	10.4%	0.04%
2017	4.86	2.09 - 7.47	21.5%	0.3%
2018	4.52	1.58 - 7.41	30.5%	1.0%
2019	4.20	1.05 - 7.30	39%	2.3%

We should emphasize that these predictions should be viewed with a healthy amount of skepticism. Predictions at the longest time frames are far outside the model bounds, and the volume model has not been demonstrated to be reliable even over shorter time intervals. From a numerical standpoint, the mean values are reasonable, but the prediction intervals for the extent (and associated probabilities of setting a new minimum and falling below 1 million sq km) are suspiciously small.

On the other hand, quick observation of the historical extent data for September shows that a linear trend with annual slope of -53000 sq km and a prediction interval of plus or minus 1 million sq km captures nearly all of the observed data. (Notable exceptions include 2007 and 2012, but 2013 and 2014 are just below the trend.) On that basis, the above predictions may not be unreasonable.

Future Work

As discussed above, the current model for the NSIDC extent shows a correlation in the residuals at an interval of 11 months. The model can potentially be modified to account for this.

The model for the PIOMAS volume fails to have residuals with constant variance. Exploring other time series models (for example, in the ARCH family) which can represent changing variance will likely improve the prediction accuracy of the model, even if just by generating more realistic prediction intervals.

The models are deliberately designed to be simple, with minimal data input. Incorporating more data into the models could improve performance. Possibilities include daily data, regional breakdowns, and data from other sources. As a possible starting point, perhaps the data could be considered as a joint time series of both extent and volume.

OSweetMrMath · « **Reply #2 on:** August 15, 2015, 10:03:10 AM »

For the last year or more I have been posting regular comments in various threads about predictions of sea ice extent and volume based on some time series models. I have given brief descriptions of the models in various places, but I felt it was time for a more complete writeup.

I wrote this in a deliberate fake academic style, which I hope will not make it more unreadable than the length of the post already does. The post turned out to be long enough that I had to break it into two pieces for the system to allow the post. Apparently I have a lot to say.

In any event, I hope this is of interest to the other posters here, and I am happy to answer questions and to accept feedback and criticisms. My goal was to create a fairly simple baseline for predictions of extent and volume. After my own predictions in 2013 were wildly wrong, I created these models with the goal of reducing the chance of my being that wrong again.

Neven · « **Reply #3 on:** August 15, 2015, 10:43:56 AM »

Thanks, OSMM.

ChrisReynolds · « **Reply #4 on:** August 15, 2015, 10:58:22 AM »

Thanks from me too. It's such a comprehensive write up that I haven't got any questions. However it is worth noting that using this within the established forecast periods (e.g. Blanchard Wrigglesworth) seems reasonable despite it not being a physical model per se.

It's a useful addition.

Richard Rathbone · « **Reply #5 on:** August 15, 2015, 01:30:04 PM »

NSIDC report trends on the individual months separately. Are these trends in fact sufficiently close to make it valid to use the same trend for April and September? PIOMAS anomaly shows a distinct seasonal variation in recent years, which suggests to me that just removing an annual trend isn't good enough and that a substantial fraction of the "noise" in your time series model is actually due to including systematic differences between spring and autumn trends as noise.

My initial thought was to ask the model when virtually ice free conditions become plausible, but I suspect it will horribly overpredict the time it would take as a result of not capturing a significant difference between trends in different seasons.

Richard Rathbone · « **Reply #6 on:** August 15, 2015, 03:02:03 PM »

I think the heavy dependence on Z(t-12) is an indicator that there is something significant missing from your model. There is something seasonal which is being treated as noise and shouldn't be and consequently once you cease to have data for Z(t-12) i.e. if you project more than a year in advance, inferences drawn from the model will be worthless. If I ask it now what the probability is of a virtually ice free July 2016 is, it can give a reasonable answer because Z(July2015) is known, but if I ask it now what the probability of a virtually ice free September 2016 is, the lack of data for Z(September 2015) will make it thoroughly untrustworthy.

One factor that comes to mind that you will have trouble capturing in a simple time series model, is melting momentum. You don't have a way of distinguishing momentum in melting from momentum in freezing. Extra sun in June means not only extra melting in June from the direct effect of sun on ice, but extra melting in July from sea water that was heated in June. Your PIOMAS model has a t-2 term, which allow this to be captured, but only by including an analogous but unphysical effect of freezing momentum during the winter months.

crandles · « **Reply #7 on:** August 15, 2015, 03:21:13 PM »

Quote from: OSweetMrMath on August 15, 2015, 09:54:03 AM

The models use the monthly time series data for the extent and volume as their only inputs, and the two models are independent. That is, the extent data is used only in the extent model, and the volume data is used only in the volume model.

This seems quite a restrictive design decision.

The impression I got from my attempts at prediction was that area was a better predictor of extent than using extent as the predictor, maybe volume was also better for predicting extent depending how it was used IIRC.

Does this design decision mean that you will end up with a model that is almost bound to say that extend will tend to behave as it has in the past? So if there is some critical volume (or other variable) below which extent starts behaving differently, then your model is almost bound to fail to predict this?

If the behaviour is going to stay similar, then a persistence of anomaly model will perform quite well. Isn't it more important to looks for hints of other variables that may change the behaviour? Maybe that depends on what you see as the purpose of your model?

I don't want to be too critical - having a model is better than not having such a model and I could easily be not getting the purpose or wanting to be too ambitious. Just wanted to enquire about such design decisions and what you see as the purpose of that to better understand the aims and goals rather than being negative for the sake of it.

OSweetMrMath · « **Reply #8 on:** August 16, 2015, 07:28:50 AM »

Richard,

I think there are two basic ideas underlying your comments. They are worth thinking about, and I will consider them separately.

The first is that I am fitting a single time series to every month. The trends for each month separately are different. On top of that, the effect of one month on the following months may be different depending on the time of year. Given that, can a single time series model reasonably represent the entire year?

I wrote a really long answer. I'm throwing it out and trying a shorter one. In the case of the NSIDC extent, there's almost no evidence that these concerns are a problem. The model generally looks good from a diagnostic standpoint, the predictions are plausible, and based on 15 months of data, the predictions show reasonable agreement with reality.

Examining the residuals more closely, I have found that there is a substantial variability in the variance of the residuals month to month. The standard deviation of the residuals in February through April is about half the size of the standard deviation in September through December. This changing variability could be used to produce more accurate prediction intervals, which would depend on the month of the year. However, the mean is nearly zero year round. It is possible that the month to month changes in variance are due to the effects you mentioned, but they do not appear to substantially affect the model.

Volume is a different story. The variance of the residuals is increasing in time, which is clearly a problem. I have been looking into models with changing variance, but I haven't found anything applicable to this case, and my attempts to modify the model or transform the data to improve this have not been successful so far. As with the extent, the variance shows a monthly variability, in addition to the trend increase.

Again, I don't know whether this is due to the effects you mentioned.

One possible approach would be to model each month's data as a separate time series, and then build in interaction effects between the separate series. Modeling a single month alone clearly does worse, because the model must predict a full year in advance, as opposed to the current models, which can predict one month in advance.

There is a fair amount of literature about multivariate series, but less in the way of off-the-shelf tools for working with these series. But this may be an interesting direction to pursue in further work on these models.

OSweetMrMath · « **Reply #9 on:** August 16, 2015, 07:52:48 AM »

Richard,

The second question involves the role of Z(t-12) in predictions and the validity of long term predictions.

The problem actually occurs before that, as soon as the models are based on the year over year differences. I mentioned in the original post that this differencing gives longer term predictions the character of a random walk. Effectively, prediction errors sum year over year and have the potential to rapidly accumulate. This is clearly visible in my volume predictions through 2019, where the prediction intervals are growing at nearly the same rate that the mean is changing. The same thing occurs with the extent series, but the errors accumulate more slowly.

The conclusion here is that accurate long term predictions are just not possible. There is at least a potential for debate about whether the models reflect reality, or whether this is really just an assumption of the model.

The models can be used for long term prediction, but the results aren't very meaningful. The volume model states that the first year with a greater than 50% probability of being ice free in September is 2035. In that year, the 95% prediction interval is approximately -6 thousand cu km to +6 thousand cu km. Aside from the fact that I have no idea of what a volume of negative six thousand cu km means, the interval is so wide that the prediction is meaningless anyway.

The extent, on the other hand, shows a greater that 50% probability of being ice free for the first time in 2110. The 95% prediction interval is somewhat more narrow, something like -3 million sq km to +3 million sq km.

Both predictions are clearly wrong. The claim from the model that there is 2.3% chance of a volume below 1000 cu km in 2019 seems believable, although the implication of the increasing variance is that this probability is too low. If you want a better guess about whether the volume will reach 1000 cu km before 2029, ask me again in 2019.

OSweetMrMath · « **Reply #10 on:** August 16, 2015, 08:46:44 AM »

Crandles,

My initial motivation for these models was that in 2012 or so, I was more or less terrified by predictions of sea ice levels based on fitting a curve through one data set up to that point and extrapolating forward. (I don't know when I started watching this, but definitely by 2012 and possibly 1 or 2 years before that.) I spent all of the summer of 2013 expecting the extent to suddenly collapse, passing the 2012 extent, and catching up with the prediction.

Needless to say, that did not happen.

In the spring of 2014, I decided that I could take the monthly volume data, and rather than fitting an arbitrary curve for predictions, I could create a time series model and use that for predictions. This had the clear benefit of more connection to reality than curve fitting, and I could set up the model quickly and easily. (It took me substantially less time to create the model in the first place than to complete this description of the model this week.)

A few weeks later, I decided to create a similar model for extent. Through 2014, my predictions for the minimum were consistently substantially higher than the predictions for the minimum on this site, and my predictions for the extent were at nearly the median on the Sea Ice Prediction Network site. I was wrong, of course. The actual minimum was substantially higher than my predictions.

The good news was that in both cases, my month to month predictions were almost always "correct" to within a reasonable error, with the exception of a single missed prediction. Furthermore, after correcting for that missed prediction, my subsequent predictions became reliable again.

(My predicted value from last July of the volume for this July was 11.2 thousand cubic km. The actual value was 11.6. My predicted value from last July of the extent this June was 11.0 million sq km. The actual value was 11.0. My July prediction was less accurate. My volume prediction has gone a full year at this accuracy, but my extent prediction only lasted 11 months.

I should not claim too much. My conclusion is not that my models are this good. It's that they were this lucky.)

This year, I'm doing it again, but I wanted to explain my methodology. My starting goal was to build a simple model, which I could do quickly and easily, but which would give decent prediction performance. This model could almost certainly be made better if I used more outside data in the model.

For starters, building a multivariate model using both the extent and the volume might be useful. Right now, the models are not consistent, in the sense that one model could predict an ice free state while the other model predicts a positive amount of ice. I might be able to fix that. There is some evidence that changes in volume anticipate changes in extent, and I could definitely model that.

The nature of the model, as a purely statistical model without any physical component, basically guarantees that the model will predict that behavior will continue as it has in the past. I'm not sure that incorporating other data would change that, unless the data were built into a physical framework. Modeling extent and volume together as a multivariate time series would still predict that the behavior of volume and extent together would continue as it had in the past.

Right now, I'm not sure that it's possible to accurately predict behavior changes. If ChrisReynolds' Slow Transition is correct, that probably qualifies as a change in behavior. On the other side, I think some people are still arguing that the ice this year is so broken up that an unprecedented collapse in extent could still occur this year. Assuming that either prediction of a behavior change is correct, how would we know before the behavior change actually occurs?

Basically, I'm open to arguments that things will be different, but I tend to be skeptical of arguments that things are different starting right now (or that things are already different, but it's not showing up in any metrics yet). One way that I view my models is as a baseline. If things are indeed different, it will be clear from the models, because the models will fail.

crandles · « **Reply #11 on:** August 16, 2015, 01:44:30 PM »

Quote from: OSweetMrMath on August 16, 2015, 08:46:44 AM

For starters, building a multivariate model using both the extent and the volume might be useful. Right now, the models are not consistent, in the sense that one model could predict an ice free state while the other model predicts a positive amount of ice. I might be able to fix that. There is some evidence that changes in volume anticipate changes in extent, and I could definitely model that.

The nature of the model, as a purely statistical model without any physical component, basically guarantees that the model will predict that behavior will continue as it has in the past. I'm not sure that incorporating other data would change that, unless the data were built into a physical framework. Modeling extent and volume together as a multivariate time series would still predict that the behavior of volume and extent together would continue as it had in the past.

Right now, I'm not sure that it's possible to accurately predict behavior changes. If ChrisReynolds' Slow Transition is correct, that probably qualifies as a change in behavior. On the other side, I think some people are still arguing that the ice this year is so broken up that an unprecedented collapse in extent could still occur this year. Assuming that either prediction of a behavior change is correct, how would we know before the behavior change actually occurs?

Basically, I'm open to arguments that things will be different, but I tend to be skeptical of arguments that things are different starting right now (or that things are already different, but it's not showing up in any metrics yet). One way that I view my models is as a baseline. If things are indeed different, it will be clear from the models, because the models will fail.

A few things seem fairly fundamental to me:

1. The lower the initial volume the earlier in the melt season albedo is lowered and magnitude of albedo is also increased. This causes increased volume melt and area and extent tend to move with that.

2. During winter ice can grow to over 2m thick in the centre of the pack off Ellesmere Island and this tends to thin as you move away from there. At least the vast majority of this thin ice can form in almost any winter regardless of quantity of starting ice at beginning of freeze season.

3. Thicker ice takes longer to build. When quantities of this melt then it isn't coming back for a least a few years and possibly not returning at all. We have been through a rapid decline in this thicker ice during 1990 to 2007 period.

Observing the decline total ice which is dominated by the decline in the thicker ice and projecting continuation of this trend when there is very little thicker ice left and what is left is declining at a much slower rate seems wrong.

(It actually looks like gains in MYI extent since 2007 but I think it more sensible to think 2007 is shock low level of extent of MYI and a slow decline from higher natural level seems much more sensible. Also note that doesn't say anything about thickness and the trend in ice in excess of 2m thick may well show decline since 2007.)

Time series models are nice in that they learn from what happens without having to understand. However if physical causes are understood and there are reasons why time series analysis could be mislead, then a physically based model seems preferable don't you think?

>"using both the extent and the volume might be useful"
Using both volume and area (or better measures) seems essential to me if you want to try to predict more than 24 months ahead.

For a few months ahead, besides current levels of sea ice, a fast or slow start to melt season seems important and other data like the snow cover on land that Rob Dekker uses could well be useful information. This seems aimed at trying to predict the short term anomalies rather than longer term pattern.

>"Right now, the models are not consistent"
Gompertz shape leads to less difference than exponential shape. It is also more consistent with models and physical reasons above.

>"guarantees that the model will predict that behavior will continue as it has in the past. I'm not sure that incorporating other data would change that, unless the data were built into a physical framework."
I suggest different data for different purposes. How that other data is used may well matter. So physical framework seems essential for substantial improvement - yes I not only agree with that but am tending towards arguing that this is what should be concentrated on instead of time series analysis. Maybe that is too harsh - each to his own: I certainly don't know where it might lead until someone tries.

>"Modeling extent and volume together as a multivariate time series would still predict that the behavior"
Not sure this is the way to go. Lower level of volume as predicting larger size of seasonal swing for physical reasons seems a different sort of model to a 'multivariate time series'?

>"probably qualifies as a change in behavior"
I am thinking evolution in behaviour is more likely than abrupt shift at least for the next several years. While that may sound like time series model might do OK, it needs to work on relevant data rather than misleading data. Teasing out what data is less relevant and what data is more relevant data from the physical reasoning and working on the relevant data seems more sensible if there is reason to think the time series data is being mislead. I.e. thick MYI may be less relevant to future than in past so try to establish what is happening with thinner FYI and perhaps work on using that to predict the future seems more sensible to me.

These are just my thoughts and reactions. You are, of course, free to disagree and pursue your own course. Even if I am inclined to disagree with direction, a model is better than no model. Also I could be completely wrong, your approach may well be much better than what I am suggesting.

Richard Rathbone · « **Reply #12 on:** August 16, 2015, 02:52:53 PM »

Your time series model assumes

i) there is a trend
ii) there is seasonal variation around that trend
iii) other variation is random
iv) there is some persistence in that randomness

Setting it up as a time series allows you to quantify the nature of that persistence and separate it from uncorrelated noise, and hence to use iii) to put more reliable confidence limits around the central projection that is just made from i) and ii). These in turn could be used to generate a pdf for the first ice free year.

Or it would, if what is done to account for 1), ii) and iv) leaves a genuinely random iii).

I'd think the obvious improvement to try would be to make the detrending function used in i) a function of the month.

...

Why do you use Z(t-12) rather than Y(t-12)? Noise should not suddenly reappear a year later. If something significant is consistently reappearing a year later, that's signal.

...

I'm not interested in what year this model give a probability that of an ice free month at 50%, once there has been an ice free month the model is no longer valid, I'm interested in what year the probability that there has not so far been an ice free month drops to 50%. I could run it in Monte Carlo mode and see where it stopped each time, but if the noise was better represented I could just derive it from the form of that noise.

Richard Rathbone · « **Reply #13 on:** August 16, 2015, 03:09:16 PM »

I see why Z(t-12) has to be there now. Its because Y is a difference function on X rather than a detrended X.

Richard Rathbone · « **Reply #14 on:** August 16, 2015, 03:27:51 PM »

Y(t) = X(t) - X(t-12) I think correctly deals with the average trend, but I don't think it properly captures the seasonal variation. I think you are implicitly assuming that the first season in the dataset is the best representation of the seasonal variation, rather than calculating it from the full set of data.

OSweetMrMath · « **Reply #15 on:** August 17, 2015, 12:40:52 AM »

Richard,

Let me try to talk through a derivation. The derivation will make lots of assumptions, but my claim is that the basic ideas still approximately hold even if the assumptions are not correct.

As a starting point, I think it will be more clear if I change the indexing, so X(t, m) refers to the PIOMAS volume in year t and month m. X(2015, 1) is the volume in January of this year, X(2014, 12), is the volume in December of last year, and so on. The month m wraps around, so X(t, m+12) = X(t+1, m).

Now, assume that if we look at the data for one month across every year, this data can be modeled as a linear function plus noise, so for a fixed month m, the volume as a function of the year t is

X(t, m) = a(m) + b(m) t + noise

a(m) and b(m) are constants (the intercept and the slope of the linear fit, but both depend on the particular month, so I have written them as functions of m. Doing linear regression fits on the actual PIOMAS data, we have

for January
X(t, 1) = 588.3680 - 0.2836 t

for February
X(t, 2) = 574.6602 - 0.2753 t

and so on, so a(1) = 588.3680, a(2) = 574.6602, b(1) = - 0.2836, b(2) = - 0.2753, and so forth. (The constants look large, but if you plug in values starting at 1979 for t, the resulting ice volumes make sense.)

Next, assume the values for a(m) can be anything at all. (It will turn out not to matter.) Assume that b(m) is a sinusoid plus a constant, so b(m) = c + d cos( pi/6 m + e ) where c, d, and e are constants.

Finally, assume the noise is Gaussian and has a well behaved correlation structure.

Substituting for b(m), we have
X(t, m) = a(m) + (c + d cos (pi/6 m + e ) t + noise(t, m)

First, ignore the noise. Second, compute the year over year differences Y(t, m) by
Y(t, m) = X(t, m) - X(t-1, m).

Dropping the noise term from X(t, m) and substituting in gives
Y(t, m) = c + d cos( pi/6 m + e)

c is a constant, so we can subtract it out without changing this argument, so
Y(t, m) = d cos( pi/6 m + e)

Next, some trigonometry shows that
Y(t, m) = sqrt(3) Y(t, m-1) - Y(t, m-2)

This shows that the deterministic part of the original process X(t, m) can be represented as a (deterministic) AR(2) process on the difference process Y(t, m). The constant periodic component in the original process is contained in a(m), which is eliminated by the differencing. The linear trend appears as a constant in the difference process, which can be subtracted out. Any change in the periodic component is represented by the AR(2) process.

The original process has a noise term. Under the assumption that the noise is Gaussian, we can apply the same differencing and AR(2) operations to the noise, and the transformed noise is still Gaussian. If the transformed noise is stationary (and the correlations are well behaved), it can be represented by an ARMA process driven by independent Gaussian noise. The AR component of the noise process adds to the original AR deterministic process, which means that Y(t, m) can be modeled as an ARMA process driven by independent Gaussian noise (plus a constant).

Since X(t, m) can be written as a difference equation using the Y(t, m), plus an initial condition, the ARMA process for Y(t, m) allows us to represent (and predict) X(t, m). This representation includes all of the original features of X, including the trend, seasonal variation, changes in the seasonal variation, and correlated noises (or random variations plus persistence in the randomness).

Caveats: We assumed the trends are linear. If this is weakly not correct, the difference between the trend and the linear approximation is small and can be represented as a slightly nonstationary component in the noise term. If this is strongly not correct, a second differencing will eliminate the quadratic term.

We assumed the change in the seasonal variation remaining after differencing can be represented by a sinusoid. If the shape is approximately a sinusoid, the difference can again can appear as a slightly nonstationary component of the noise. For more complex shapes, the shape can be viewed as a sum of sinusoids (a Fourier series) and we can add higher order AR terms to represent the other sinusoids.

We assumed the noise is Gaussian and the transformed noise is stationary. The Gaussian assumption is really just a convenience. Prediction intervals based on the variance of the noise depend on this assumption, but we can also estimate the intervals based on a bootstrap Monte Carlo method which does not require this.

The stationary assumption is more important. The standard method for fitting the model parameters based on the data minimizes the noise (the details are a little more complicated) but this minimization assumes all the noises are equivalent. The procedure could be modified to handle the case where the variance of the noise changes from month to month and the correlations change month to month, as long as the relationships are still the same year to year.

This is what appears to be happening with the extent model, where the variance of the innovations depends on the month but does not appear to be changing year over year. The volume model has changes in variance both month to month and year over year, and I don't have a good approach for handling that.

The final caveat is that it's one thing to start with a full model and break it down into its components. It is a far trickier thing to start with data and estimate the model that it came from. All I can say is that there are standard methods for finding good estimates, and as applied to the actual data in this case the estimates are mostly working out.

Because the model fitting seeks to minimize the noise, I disagree with the assertion that the model assumes that the first season gives the seasonal variation. There is an implicit seasonal variation, but it is found by "averaging" over all of the data to minimize the total noise.

Related to this, you also asked why I'm using Z(t-12) rather than Y(t-12). The short answer is that I tried both and it gave a better fit. An additional Z(t-24) or Y(t-12) term did not further improve the fit. I could give a longer answer in terms of a physical interpretation, but if I had ended up with Y(t-12), I could give a physical interpretation justifying that too.

OSweetMrMath · « **Reply #16 on:** September 02, 2015, 12:25:40 AM »

As discussed above, the time series methodology results in an update to the predictions from the model as each month's data becomes available. Based on the July extent data, the predictions for August and September extent were 6.10 and 5.05 million sq km, respectively. The 95% prediction interval for September was 4.39 - 5.67 million sq km.

At the time I remarked that the daily data in July implied that the August prediction was likely to be too high, and the previous prediction for August of 5.68 million sq km was likely to be closer to the mark. The observed data for August is 5.61 million sq km.

If I can still take credit for the earlier prediction, this means that my prediction in June for the August extent was less that 100 thousand sq km away from the observed value, which is quite accurate. More honestly, my prediction in July for the August extent was off by 500 thousand sq km, meaning the model has had substantial misses two months in a row.

The updated prediction for September is 4.72 million sq km, with a 95% prediction interval of 4.16 - 5.28 million sq km. The daily data for August has shown an unusually large extent loss recently. This indicates that the observed September extent is likely to be below the predicted value, just as the observed August extent was below the predicted value.

Given how close the September values are in other years, any outcome between 2nd lowest and 6th lowest on record is still plausible. However, the daily data favors the lower outcomes.

OSweetMrMath · « **Reply #17 on:** September 06, 2015, 02:05:02 AM »

I can now update my PIOMAS predictions based on the August data. The predictions from the July data of the August and September volume were 7.18 thousand cubic kilometers and 5.89 thousand cubic kilometers. The 95% prediction interval for the September volume was 4.83 to 6.83 thousand cubic kilometers.

The observed volume in August was 7.073 thousand cubic kilometers, resulting in a prediction error of 0.11 thousand cu km. Not bad!

This was used to update my prediction for September, which is now 5.72 thousand cubic km. The 95% prediction interval is 5.18 to 6.21 thousand cubic km.

This year's September volume is likely to be the 5th lowest on record, but it could still potentially pass below 2013's value of 5.479 thousand cu km to become the 4th lowest. Ending below the 2010 level of 4.742 to reach 3rd lowest or above the 2007 level of 6.526 to become 6th lowest is very unlikely.

mmghosh · « **Reply #18 on:** September 10, 2015, 05:00:45 AM »

Is it worthwhile to look at the total volume drop in 2015, compared to past years? This is in view of the fact that 2015 started pretty high.

OSweetMrMath · « **Reply #19 on:** November 17, 2015, 04:51:39 AM »

Here is a belated update on my model predictions.

My last update included a prediction of the September NSIDC extent based on the August data, in which I predicted a value of 4.72 million sq km, with a 95% prediction interval of 4.16 - 5.28 million sq km. The observed September extent was 4.63 million sq km, which is more in line with my usual prediction accuracy than the previous two months. Based on the April data, the prediction for September had been 4.77 million sq km, so the prediction for the September extent was off by less than 0.15 million sq km 5 months in advance.

Looking forward from there, the September prediction for the October extent was 7.49 million sq km. The observed October extent was 7.72 million sq km, which is within the error range of this model. The current prediction for the November extent is 10.04 million sq km. The 2016 maximum in March is currently predicted to be 14.65 million sq km, with a 95% prediction interval of 13.9 - 15.2 million sq km.

The August prediction of the September PIOMAS volume was 5.72 thousand cu km, with a 95% prediction interval of 5.18 - 6.21 thousand cu km. The observed value was 5.84 thousand cu km. The April prediction for September had been 6.22 thousand cu km. The prediction error 5 months in advance was relatively large due to the more rapid volume loss in July and August than had been predicted.

The September prediction of the October volume was 7.02 thousand cu km. The observed October volume was 6.99 cu km. The current prediction of the November volume is 10.16 thousand cu km. The 2016 volume maximum in April is currently predicted to be 23.17 thousand cu km, with a 95% prediction interval of 21.28 - 24.94 thousand cu km.

AbruptSLR · « **Reply #20 on:** November 20, 2015, 06:39:49 PM »

The linked reference indicates that quantifying regional dependence for Arctic Sea Ice loss can improve the accuracy of our estimates to the timing of the onset of rapid seasonal ASI concentration decline:

S. Close, M.-N. Houssais and C. Herbaut (2015), "Regional dependence in the timing of onset of rapid decline in Arctic sea ice concentration", Journal of Geophysical Research – Oceans, DOI: 10.1002/2015JC011187

http://onlinelibrary.wiley.com/doi/10.1002/2015JC011187/abstract

Abstract: "Arctic sea ice concentration from satellite passive microwave measurements is analysed to assess the form and timing of the onset of decline of recent ice loss, and the regional dependence of the response. The timing of the onset is estimated using an objective method, and suggests differences of up to 20 years between the various subregions. A clear distinction can be drawn between the recent onset times of the Atlantic sector (beginning in 2003) and the much earlier onset times associated with the Pacific sector, where the earliest transition to rapid loss is found in 1992. Rates of decline prior to and following the transition points are calculated, and suggest that the post-onset rate of loss is greatest in the Barents Sea, and weakest in the Pacific sector. Covariability between the seasons is noted in the SIC response, both at interannual and longer time scales. For two case regions, potential mechanisms for the onset time transitions are briefly analysed. In the Barents Sea, the onset time coincides with a redistribution of the pathways of ice circulation in the region, whilst along the Alaskan coast, the propagation of the regional signal can be traced in the age of the sea ice. The results presented here indicate a series of spatially self-consistent regional responses, and may be useful in understanding the primary drivers of recent sea ice loss."

OSweetMrMath · « **Reply #21 on:** March 18, 2016, 02:37:58 AM »

I knew it had been a while since I posted an update, but I didn't realize just how long it's been.

In any event, I had predicted the November NSIDC extent to be 10.04 million sq km and the March extent to be 14.65 million sq km. The observed November extent was 10.06 million sq km. My prediction for January was too high by 0.1 million sq km, and my revised prediction for February was again too high by 0.1 million sq km, so my current prediction for March is 14.52 million sq km. Based on the fact that the daily extent in March is yet to reach 14.5 million sq km, it is likely that my prediction is again too high.

Looking ahead, the current prediction for September is 4.70 million sq km. The model interprets last year's extent in August and September as unexpectedly low and corrects by predicting an increase this year. However, the model has no awareness of the Arctic heat this year, and does not interpret the current low extent as predictive for the summer minimum. I suspect the model could be very wrong.

Turning to volume, the prediction was that the November PIOMAS volume would be 10.16 thousand cu km, and the April PIOMAS volume would be 23.17 thousand cu km. The observed November volume was 10.30 thousand cu km. The prediction for January was 0.35 thousand cu km too high, and the revised prediction for February was 0.25 thousand cu km too high. The current revised prediction for April is 22.50 thousand cu km.

The current prediction for September is 4.81 thousand cu km. This would essentially erase the "recovery" of the past few years, but would still be above the 2010 level. I find this more plausible than the extent prediction, but I think this could also be too high.

OSweetMrMath · « **Reply #22 on:** April 10, 2016, 01:52:02 AM »

Let's see if I can start writing up monthly updates again.

Last month, my prediction for the monthly NSIDC extent for March was 14.52 million sq km, with the caveat that it was likely to be high. The data is in, and the monthly extent was 14.43 million sq km, so my prediction was around 100 thousand sq km high. This reduces the predicted value for April from 13.98 million sq km to 13.92 million sq km. As I've discussed elsewhere, maximum extent and minimum extent are effectively uncorrelated (after accounting for the long term trend) so the prediction for the September monthly extent is unchanged at 4.70 million sq km.

My prediction for the monthly PIOMAS volume for March was 21.46 thousand cu km. The true value is 21.53 thousand cu km, slightly higher than predicted. This increases the prediction for April from 22.50 thousand cu km to 22.60 thousand cu km. This slightly increases the predicted monthly volume for September, increasing from 4.81 thousand cu km to 4.89 thousand cu km.

OSweetMrMath · « **Reply #23 on:** May 18, 2016, 02:05:42 AM »

I was holding out on an update in the hope that the April NSIDC extent would become available. Since I don't know how long I will have to wait, here's an update to the PIOMAS volume.

My predicted volume for April was 22.61 thousand cubic km. The reported volume is 22.458 thousand cubic km, narrowly setting a record for April volume. This reduces my predicted volume for September to 4.71 thousand cubic km. The current prediction is just below the 2010 value, or the third smallest on record. At five months in advance, the prediction interval is quite wide, so a new record minimum is certainly a possibility.

2012 made its record-setting move in June, so I'm hesitant to predict that this year will break 2012's record before then. My current prediction for June is 16.70 thousand cubic km, compared to 16.002 thousand cubic km in 2012.

Over the last year, there has been a substantial drop in my predicted values. The prediction one year ago for April was 23.37 thousand cubic km. My prediction for September was 5.49 thousand cubic km. January and February were both well below my predicted values (0.35 and 0.25 thousand cubic km, respectively) which pushed my predicted values for April and September lower. With April coming in 0.15 thousand cubic km below my prediction, my September prediction has fallen again. The assumption for the model is that the predicted values are equally likely to be too high or too low. That said, given the weather and other information about the ice, it would not be surprising if the predicted values continue to be too high in the near future.

jdallen · « **Reply #24 on:** May 18, 2016, 04:22:01 AM »

Osweetmath... 4.7 million in Sept is WAY to high...

OSweetMrMath · « **Reply #25 on:** May 19, 2016, 03:37:34 AM »

Quote from: jdallen on May 18, 2016, 04:22:01 AM

Osweetmath... 4.7 million in Sept is WAY to high...

It's what the model predicts. The model assumes that the weather and ice state up to this point is exceptional, and things will return to a more normal mode soon. I don't really believe that, but I do think that the 2012 record was a result of the collapse in June. I think there's potential for a similar collapse this year, but I'm holding off on calling it a sure thing.

OSweetMrMath · « **Reply #26 on:** July 11, 2016, 04:08:51 AM »

I missed a month, but here are my updated model results and predictions.

The NSIDC had satellite problems starting in April, so my last update was based on the March data. Since then, they have switched over to a new satellite and built an entirely new data set. I have updated my model based on the new data set. I have not done an analysis of the differences between the data sets. There may be inconsistencies in comparing my predictions from the old data set with my predictions from the new data set.

Using the new data, my prediction for the monthly NSIDC extent for April would have been 13.98 million sq km. The observed extent was 13.83 million sq km. This is not an exceptionally large error. The resulting prediction in April for the May extent would have been 12.55 million sq km. The actual May extent was substantially below that, at 12.08 million sq km, as the extent maintained its record setting pace. The revised prediction for June would have been 10.59 million sq km, almost exactly on target of the true value of 10.60 million sq km.

Looking forward, my prediction based on the old March data for the September extent was 4.70 million sq km. Even the large miss in May doesn't have much impact on the September prediction, so the prediction has fallen to 4.61 million sq km, with a 95% prediction interval of 3.91-5.29 million sq km.

I had predicted the monthly PIOMAS volume for May would be 21.19 thousand cubic km. The actual volume was 21.024 thousand cubic km, so I was slightly high. The updated predicted volume for June was 16.45 thousand cubic km, while the actual volume was 16.494 thousand cubic km. My model isn't perfect, but I'm pretty happy about its accuracy.

As we get closer to September, the predicted value is more sensitive to differences between each month's prediction and the reported volume, so the predicted value for September is now 4.54 thousand cu km. The 95% prediction interval is almost too wide to be useful, at 3.17 - 5.81 thousand cu km.

OSweetMrMath · « **Reply #27 on:** August 10, 2016, 07:18:42 PM »

Now that the July data is in, here is my updated model.

For the monthly NSIDC extent, my prediction for July was 8.02 million sq km. The reported value was 8.13 million sq km. This pushes the model prediction for September up to 4.66 million sq km, a slight increase over the previous prediction of 4.61.

My prediction for the monthly PIOMAS volume in July was 9.91 thousand cu km. The reported value was 10.261 thousand cu km. This has a relatively large impact on my prediction for September, changing the value from 4.54 thousand cu km to 5.10 thousand cu km.

OSweetMrMath · « **Reply #28 on:** September 07, 2016, 07:28:53 PM »

Based on the August data, I can update my predictions for September.

For the monthly NSIDC extent, my prediction for August was 5.56 million sq km. The reported value was 5.60 million sq km. The model prediction for September has a slight increase from 4.66 million sq km to 4.68 million sq km, with a 95% prediction interval of 4.13 million sq km to 5.24 million sq km. Note that this model does not "see" the recent drop in extent, so I would anticipate the true value being closer to the bottom of the prediction interval than the top.

My prediction for the monthly PIOMAS volume in August was 6.22 thousand cu km. The reported value was 5.94 thousand cu km. The prediction is sensitive to prediction errors in the short term, so the prediction for September has been updated to 4.67 thousand cu km, with a 95% prediction interval of 4.13 to 5.16 thousand cu km. This returns the prediction to more in line with earlier this year, after a changed prediction value last month.

News:

Author Topic: Time series models of NSIDC Extent and PIOMAS Volume (Read 16680 times)