Author Topic: Making sense of irregular data (Read 8934 times)

redste · « **on:** July 06, 2015, 01:56:50 PM »

I have followed the arctic ice graphs for some years and notice the attempts to gain a long term insight from annual (or monthly) linear data which by it's very nature is irregular even chaotic, due to prevailing weather rather than climate changes. However the irregularities are disguising the real data, so better if they were filtered or weighted down. The ice which remains in Hudson bay is irrelevant to the annual min. extent, it will be all gone come September. In addition to current graphs, I suggest to display data such as piomass, ice thickness in a more limited area (say with 75deg N) or by applying a spatial algorithm that would give a better understanding of trends.
Various interpolation methods are described here www.knmi.nl/bibliotheek/knmipubIR/IR2009-04.pdf

Neven · « **Reply #1 on:** July 06, 2015, 02:45:07 PM »

Welcome, redste. I've released your account, so you should be able to post immediately.

Quote

I suggest to display data such as piomass, ice thickness in a more limited area (say with 75deg N) or by applying a spatial algorithm that would give a better understanding of trends.

Have you tried anything like this?

Nightvid Cole · « **Reply #2 on:** July 06, 2015, 04:27:12 PM »

Quote from: redste on July 06, 2015, 01:56:50 PM

I have followed the arctic ice graphs for some years and notice the attempts to gain a long term insight from annual (or monthly) linear data which by it's very nature is irregular even chaotic, due to prevailing weather rather than climate changes. However the irregularities are disguising the real data, so better if they were filtered or weighted down. The ice which remains in Hudson bay is irrelevant to the annual min. extent, it will be all gone come September. In addition to current graphs, I suggest to display data such as piomass, ice thickness in a more limited area (say with 75deg N) or by applying a spatial algorithm that would give a better understanding of trends.
Various interpolation methods are described here www.knmi.nl/bibliotheek/knmipubIR/IR2009-04.pdf

Huh? "Interpolation" is what you do to deal with gaps in a set of data, not what you do if you want to only look at a subset of the data.

ktonine · « **Reply #3 on:** July 06, 2015, 05:29:33 PM »

Quote from: Nightvid Cole on July 06, 2015, 04:27:12 PM

Huh? "Interpolation" is what you do to deal with gaps in a set of data, not what you do if you want to only look at a subset of the data.

Interpolation is the creation of new data for an existing data set. It is does not need to be in a 'gap' -- it could be anywhere; i.e., the beginning or the end, temporal or spatial, past or future.

It is often the case that technical terminology carries a different meaning than that of common usage. I believe 'interpolation' in this sense has you tripped up. You may wish to follow the link redste provided and read the literature cited.

Jim Hunt · « **Reply #4 on:** July 06, 2015, 05:52:16 PM »

Quote from: redste on July 06, 2015, 01:56:50 PM

In addition to current graphs, I suggest to display data such as piomass, ice thickness in a more limited area

Have you seen the assorted regional graphs, many of which are collected together here:

http://GreatWhiteCon.info/resources/arctic-sea-ice-graphs/

and here:

http://GreatWhiteCon.info/resources/gridded-piomas-graphs/piomas-regional-volume/

mati · « **Reply #5 on:** July 06, 2015, 09:50:36 PM »

some things to think about:

https://www.edsurge.com/n/2014-12-15-averages-don-t-matter-and-other-common-mistakes-in-data-analysis

ChrisReynolds · « **Reply #6 on:** July 06, 2015, 10:55:16 PM »

From the article Mati links to:

Quote

Edward Tufte, master of information design, once wrote, “The deep, fundamental question in statistical analysis is Compared with what?”

Precisely the problem I have had in comparing the current PIOMAS volume increase with 2012...

Years ago I used to 'evolve' neural networks as a hobby, then I got into something else. I keep pondering using a neural net on sea ice, but never get around to it.

Neven · « **Reply #7 on:** July 06, 2015, 11:13:56 PM »

Neural networks, that reminds me of the 2012 Roesel paper: Melt ponds on Arctic sea ice determined from MODIS satellite data using an artificial neural network.

ChrisReynolds · « **Reply #8 on:** July 07, 2015, 10:30:35 PM »

I was trying to remember which paper had used them, thanks Neven.

Being unphysical NNs would be formed by past behaviour. But I remain convinced that at this stage all the 'players' in the final demise of the summer ice pack are currently playing their role. The problem with evolving NNs is that it is very very computer time consuming. I used to typically have a 'generation' of 4096 members, which would be run. Then the program would 'select the best 16 performers, 'kill' the rest, and make 256 'children' of those 16, each with small variations to the numbers for the parent... run again from the new generation of 4096, select, kill repopulate etc etc. It could run for a week or so for some tasks.

All of that was on a ~200MHz Pentium in C++ programming language. I could have bigger populations with the speed of computers now.

The more I think about it, the more intriguing it is. But where to start? I'd have to regrid data to a more coarse grid - now it's looking like a pain in the arse.

sayama · « **Reply #9 on:** July 08, 2015, 05:06:40 AM »

We use neural networks often at my workplace. They have become very mature. GPUs to run them have come way down in cost and we can run huge datasets in near real time. There are many many code libraries to easily implement them now as well. I would say go for it.

http://devblogs.nvidia.com/parallelforall/cuda-spotlight-gpu-accelerated-deep-neural-networks/

AySz88 · « **Reply #10 on:** July 08, 2015, 06:57:38 AM »

Quote from: ChrisReynolds on July 07, 2015, 10:30:35 PM

All of that was on a ~200MHz Pentium in C++ programming language. I could have bigger populations with the speed of computers now.

The more I think about it, the more intriguing it is. But where to start? I'd have to regrid data to a more coarse grid - now it's looking like a pain in the arse.

Perhaps a bit on a tangent, but I guesstimate a 1000x speedup (20x clock speed, 32x cores, 4x IPC = 1280x), so that same one-week task would now take maybe 10 minutes of computing time. That's literally less than a dollar of Google Cloud Compute.

Have I gotten you back to intrigued again? Or maybe just to slap an intern on it...?

epiphyte · « **Reply #11 on:** July 08, 2015, 07:16:18 AM »

Having served as the Cray on-site analyst at both UKMET and ECMWF at one time or another, the most apposite quote that comes to mind is from an (ID redacted to protect the innocent) BBC weatherman upon the commissioning of the (then) most powerful computer in Europe...

"Fantastic! we'll be able to get it wrong twice as fast now!"

AySz88 · « **Reply #12 on:** July 08, 2015, 07:55:59 AM »

Quote from: epiphyte on July 08, 2015, 07:16:18 AM

"Fantastic! we'll be able to get it wrong twice as fast now!"

I do see the punchline, but (on this side of the pond?) "failing quickly" is an ideal to strive for!

Wipneus · « **Reply #13 on:** July 08, 2015, 08:00:03 AM »

Quote from: Neven on July 06, 2015, 11:13:56 PM

Neural networks, that reminds me of the 2012 Roesel paper: Melt ponds on Arctic sea ice determined from MODIS satellite data using an artificial neural network.

I do not doubt their results, do not misunderstand me, but the method is a prime example where a trivial straight forward calculation would have yielded exact results without the computing overhead and uncertainty of neural networks.

I have found that all of the neural network solutions that I have encountered have been a compensation for lack of understanding the problem or the result of sales/marketing forces: customers prefer neural networks.

seaice.de · « **Reply #14 on:** July 08, 2015, 11:23:56 AM »

I agree that the result would have been very similar without the NN but the calculation (optimization problem for every pixel) has been considerably speeded up by the NN .

Quote from: Wipneus on July 08, 2015, 08:00:03 AM

Quote from: Neven on July 06, 2015, 11:13:56 PM
Neural networks, that reminds me of the 2012 Roesel paper: Melt ponds on Arctic sea ice determined from MODIS satellite data using an artificial neural network.

I do not doubt their results, do not misunderstand me, but the method is a prime example where a trivial straight forward calculation would have yielded exact results without the computing overhead and uncertainty of neural networks.

Wipneus · « **Reply #15 on:** July 08, 2015, 12:57:43 PM »

Quote from: seaice.de on July 08, 2015, 11:23:56 AM

I agree that the result would have been very similar without the NN but the calculation (optimization problem for every pixel) has been considerably speeded up by the NN .

Quote from: Wipneus on July 08, 2015, 08:00:03 AM
Quote from: Neven on July 06, 2015, 11:13:56 PM
Neural networks, that reminds me of the 2012 Roesel paper: Melt ponds on Arctic sea ice determined from MODIS satellite data using an artificial neural network.

I do not doubt their results, do not misunderstand me, but the method is a prime example where a trivial straight forward calculation would have yielded exact results without the computing overhead and uncertainty of neural networks.

Lars,

We are talking about equation 2 in the paper, relating area fractions of melt ponds , ice and open water (A_M, A_I, A_W) to reflectances in three MODIS wavelenths and an additional constraint that the sum of area fractions is 1.

The paper says:

Quote

The set of linear Eq. (2) contains three unknowns
(AW,AM,AI) in four equations, therefore the equations are
overdetermined. That means more than one exact solution is
possible and thus, we consider the linear Eq. (2) as an optimization
problem which needs to be solved in a least-square
sense. For the solution of these equation we use a quasiNewton
approximation method (Broyden-Fletcher-GoldfarbShanno
method).

That is over the top. Solving an overdetermined system is the basis of all analysis of physical measurements that involve multiple readings **). Any student physics may be asked to solve it in his first year exam "measuring in physics".
You can consider it as an optimization problem, but to solve it (in the least square sense) there exist efficient direct methods.

There are complications however, the paper says:

Quote

With the assumption of a three class mixture model and
the selection of three surface types, we find, that especially
the surface types open water and melt ponds are almost linearly
dependent, therefore the set of linear Eq. (2) is not well
conditioned. To comply with the physical principles, it is
necessary to constrain the interval of the solution between
zero and one for each class. (...)

This I understand, but again I do not see the need for the complicated way the problem is solved.
Restricting the area fractions between 0 and 1 makes the optimization of the type "mixed integer". In this case with only 3 unknowns it is very simple to reduce them to several simpler equations:
For each of the fractions (I,M,W) consider three possibilities: 0, 1 or in between. That is 27 cases at most, of which most do not make sense. Left are a handful of cases (7 but some are trivial) that can be solved efficiently, lowest least square is the optimum.

**): That second line in the quoted text should be "That means that in general no exact solution exists". The opposite of what is written.

mati · « **Reply #16 on:** July 08, 2015, 06:40:14 PM »

Quote from: sayama on July 08, 2015, 05:06:40 AM

We use neural networks often at my workplace. They have become very mature. GPUs to run them have come way down in cost and we can run huge datasets in near real time. There are many many code libraries to easily implement them now as well. I would say go for it.

http://devblogs.nvidia.com/parallelforall/cuda-spotlight-gpu-accelerated-deep-neural-networks/

The latest graphics cards are more powerful than the original CRAY 1 !!!

http://www.tomshardware.com/reviews/amd-radeon-r9-fury-x,4196.html

ChrisReynolds · « **Reply #17 on:** July 09, 2015, 07:19:14 PM »

Quote from: AySz88 on July 08, 2015, 06:57:38 AM

Quote from: ChrisReynolds on July 07, 2015, 10:30:35 PM
All of that was on a ~200MHz Pentium in C++ programming language. I could have bigger populations with the speed of computers now.

The more I think about it, the more intriguing it is. But where to start? I'd have to regrid data to a more coarse grid - now it's looking like a pain in the arse.

Perhaps a bit on a tangent, but I guesstimate a 1000x speedup (20x clock speed, 32x cores, 4x IPC = 1280x), so that same one-week task would now take maybe 10 minutes of computing time. That's literally less than a dollar of Google Cloud Compute.

Have I gotten you back to intrigued again? Or maybe just to slap an intern on it...?

Maybe, but I remember why I dropped the hobby. Initially I was frustrated by the trash going around the AI community...

The simplest NN has inputs which are weighted and summed then applied to a 'logic gate', if the result of the summing is over a threshold the gate turns on and the neuron activates. What I found was things like researchers using sigmoid functions instead of on/off gates, this was done because in their training procedures they needed to be able to differentiate the functions, you can differentiate a sigmoid, not a step function. After reading Hofstadter's 'Goedel, Escher, Bach' I was convinced he was right, that AI requires entities that interact with the world by experience. Such entities cannot be built by someone messing around inside the net, the only way to do it would be the way nature does, evolve them. When I began to hear that proper professional researchers were moving to evolving rather than tinkering, I concluded they'd do much better than I could.

My ideas was basically, start with an amoeba (reacts to a single stimulus) build up to a jelly fish, then an insect, maybe detour into collective intelligence of insects (ants and bees), then a lizard, a mouse, a dog, etc etc. When I first played UNREAL, my thought was - here is an evironment for AIs to live in! After I'd got over what a stunning experience playing in such a beautifully rendered world was.

Sayama,

Thanks for reminding me about GPUs, I've read about them being used like that before.

mati · « **Reply #18 on:** July 10, 2015, 05:25:21 PM »

GPUs are being used for bitcoin mining as well as FPGAs.
GPU programming status:
http://www.researchgate.net/post/How_can_a_CPU-GPU_program_be_written

my first "own" program i wrote was conways game of life in fortran on an ibm370.
my have things changed.

My AI courses from back in the day are pretty well irrelevant now, but there is still
some very interesting research being done in software learining systems.

my gut feeling is that the neural network approaches being used currently are
not heading in the right direction.... something is missing.. and i think that is the
massive interconnectivity of the human brain, and the fact that this interconnectivity
can re-organize itself

News:

Author Topic: Making sense of irregular data (Read 8934 times)

redste

Making sense of irregular data

Neven

Re: Making sense of irregular data

Nightvid Cole

Re: Making sense of irregular data

ktonine

Re: Making sense of irregular data

Jim Hunt

Re: Making sense of irregular data

mati

Re: Making sense of irregular data

ChrisReynolds

Re: Making sense of irregular data

Neven

Re: Making sense of irregular data

ChrisReynolds

Re: Making sense of irregular data

sayama

Re: Making sense of irregular data

AySz88

Re: Making sense of irregular data

epiphyte

Re: Making sense of irregular data

AySz88

Re: Making sense of irregular data

Wipneus

Re: Making sense of irregular data

seaice.de

Re: Making sense of irregular data

Wipneus

Re: Making sense of irregular data

mati

Re: Making sense of irregular data

ChrisReynolds

Re: Making sense of irregular data

mati

Re: Making sense of irregular data