Further observation - adding snow cover does little to improve predictions over the whole time course, but is a bit more use in recent years. That's consistent with the idea that snow cover matters more when the ice pack as a whole is thinner and has less thermal inertia.
It would be interesting to look at extent as an individual variable - is it better or worse than area? Snow as a single variable is unlikely to be much use :-)
For completeness' sake, the other permutations of the two-variable models might be fun to look at. Formally, there are another 5 pairs that could be checked, namely extent+area, extent+snow, extent+time or area+time.
Similarly, there are two more three-variable models: snow+extent+time, and area+extent+time, both of which I suspect would be worse; and a four-variable model which would likely be slightly better, but which will fit an elephant if you want it to :-)
I've added some of these permutations to the table :
1979 - 2017 1992 - 2018
Adjusted SD Adjusted SD
k=2 (one variable) :
Time 543 542
Area 420 396
Snow 711 617
Extent 502 522
k=3 (two variables) :
snow+area 415 375
area+time 422 379
snow+extent 495 466
k=4 (three variables) :
snow+area+extent 413 378
snow+area+time 419 367
k=5 (four variables) :
snow+area+extent+time 415 359
Conclusions from these added combinations :
1) 'Extent' is not really a good predictor. 'Area' is much better, as was already noted in the post by Bill Fothergill :
http://neven1.typepad.com/blog/2013/06/problematic-predictions.htmlThe only reason I included 'extent' in my formula was that (extent - area) seemed to be a good metric for "water that is very close to ice", which I thought may have a physical meaning as an area that would more efficiently turn solar heat into melting ice.
Now there is some correlation with that, but these experiments show that it's basically 'noise' as well (see the "snow+area+extent" line versus the simpler "snow+area" line).
2) Snowcover in June by itself is a terrible predictor of Sept SIE, but that is explainable : snow cover does not 'know' the June state of the ice pack.
To see that it is still useful, let's look at the physics. Since we are calculating how much energy there is in the Arctic system in June, that energy will be melting out ice over the June -> September period. So, we should try to predict the amount of ice that will be melting between June and September.
So let's change the regression formula so that it tries to predict the "June-area minus Sept-extent" variable instead of "Sept-extent" in absolute numbers. Nothing much changed about the regression method itself, since "June-area" is known in June.
When we do that we get these results of the for the individual variables :
1979 - 2017 1992 - 2018
Adjusted SD Adjusted SD
k=2 (one variable) :
Time 421 382
Area 420 396
Snow 426 389
Extent 433 428
This shows that 'Snow cover' even all by itself has pretty good predicting value for estimating how much more ice will melt out between June and September, especially over the 1992 - 2017 period. Only the (unphysical) 'time' variable seems to be better than 'snow cover'...
3) For the fun of it I also included a four-variable formula with "snow+area+extent+time". You can see that it makes things worse for the 1979-2017 period, and better for the 1992-2017 period.
With the increased risk that we are fitting an elephant here
Overall, I really like the simplicity (only 2 variables) and performance of the "snow+area" formula.
And it makes perfect physical sense as well : "snow+area" is how "white" the Arctic is in June.
The adjusted SD of 375 k km
2 is excellent, and beats most prediction methods in the SIPN. So I think I'm going to switch over to that simple "snow+area" model in my future predictions.
P.S. Almost all of these regression formulas end up projecting into the 4.8 - 5.2 range for Sept 2018 SIE.