I refer to
recent paper which seems to imply that the September minimum is largely determined by the melt pond fraction in May. However my analysis is that the strength of the relationship is an effect of overfitting the model, and that in actual fact melt pond fraction in late June is a better indicator of final minimum, and that it is not a significantly better predictor than a simple extrapolation of the long term trend.
First, the correlation between melt pond fraction and September minimum is quite impressive at over 0.8. Hindcasts made with this method have an average error of 0.33 million km
2.
However I've always been suspicious as there is usually almost no melt pond visible from satellite in the central Arctic area until early to mid June. Figure 1 of the paper reveals that the typical melt pond fraction at the end of may is around 2% at the end of May. Figure 2 shows that the area of the Arctic for which melt ponds are measured excludes the Hudson Bay and Sea of Okhotsk, but includes much of the Bering Strait, the Baffin bay, Greenland Sea, Barents Sea and Kara see. At the end of May surely nearly all the melt ponds must be in such fringe areas if the total fraction is only 2%. How can this determine the fate of the central ice in September?
A key issue that is not immediately obvious, is the fact that the melt pond fraction is not a straightforward calculation of total melt pond area divided by total ice area. A geographic weighting is applied with a different rating for each of 1000s of grid squares. This raised the possibility that the strong correlation is not due to causation, but is due to overfitting the model due to having too many variables available to tweak. In particular imagine any grid square that has melt ponds in one particular year, but not in any other year. The weighting for this square can then be tweaked to change the model prediction for that year without affecting the prediction for other years. Obviously there is a limit to this otherwise the correlation would be perfect instead of good. But it may cause a stronger correlation than a purely physical causal relationship would otherwise suggest. I suspect this may explain why there is a better correlation early on in May - there are more opportunities to find grid squares that affect only 1 or a small number of years whereas by July most grid squares would affect many or most years.
One good way to prevent such overfitting is to train the model on data for some of the years, and attempt to use the model to predict the result in other years. This is done in the paper by making a 'forecast' for each year by using only data for years before that year. This is effectively what would have been forecast if the method in the paper was used to actually forecast that year when the results for that year (and future years) are not yet available.
The results are much less impressive and suggest that some type of overfitting effect is at play. The standard error for the end of May melt pond fraction prediction increases from 0.33 to 0.5. The error for the prediction using melt pond fraction up to June 25 goes from being worse than end of May to being better (0.36 to 0.41). A prediction error of 0.5 to my eyes looks to be no better than the average error for making a prediction based purely on extrapolating the long term trend, although I'm not motivated enough to try and download the data and perform a calculation to confirm this.