×

Announcing: Slashdot Deals - Explore geek apps, games, gadgets and more. (what is this?)

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!

Same Programs + Different Computers = Different Weather Forecasts

timothy posted about a year ago | from the climate-change-without-leaving-the-room dept.

Earth 240

knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"

Sorry! There are no comments related to the filter you selected.

Damn you people (1)

Anonymous Coward | about a year ago | (#44405685)

Why don't you use 128bit Integers to represent some form of fixed point? I highly doubt you need any more precision than that.

Re:Damn you people (2, Informative)

YoungManKlaus (2773165) | about a year ago | (#44405987)

actually, that would be really good because you have a fixed spacing of values throughout the whole range which is a very important property in simulations (at least as far as I learned in numerical mathematics).

Re:Damn you people (5, Insightful)

Anonymous Coward | about a year ago | (#44406421)

Precision is the point. Mathematical chaos diverges exponentially. This means that if you have a value of 9.3440281 in one calculation and it returns 3.5 and a value of 9.344028147 in another, that you can get completely different results (where the second case returns 8.1). Now you say: well, let's just make it more precise then! So you put in the value of 9.34402814672 and get a completely different result (1.7), and so on*. If you weren't dealing with mathematical chaos, you would continually refine the values down (e.g. 3.5, 3.45, 3.467, etc.).

* Note: I should be careful with this layman's description to point out that more precise values technically shrink the window down. But since it is exponentially divergent in the first place, this might not ever do you any good in a realistic setting. Ref Lyapunov exponents [wikipedia.org] and mathematical chaos [wikipedia.org]

Re:Damn you people (2, Funny)

Anonymous Coward | about a year ago | (#44406527)

For being the first person ever to use exponentially correctly on slashdot I literally award you one (1) internet.

Re:Damn you people (1)

Anonymous Coward | about a year ago | (#44407105)

To potentially make this more clear: when you have an exponential divergence in output based on difference in input differences, you need a massive change in precision of inputs for minor gains on output. It is not a matter of doubling the precision of the input doubles the precision of the output. You could double the precision of the input, and get a small fraction of an improvement of the output.

The exponents in weather simulation works out such that it is nearly impossible to get more than two weeks prediction of weather in detail (some large scale systems are simpler though, and can go much, much further). And that is in the ideal case, 10 day forecasts like talked about here are pushing pretty far into the region chaos prevents accurate predictions, and this has been known for some time. In this case, you could double the precision of the inputs, and struggle to get more than an extra day or two out of the prediction.

Have these people never heard of IEEE754???? (0, Flamebait)

gweihir (88907) | about a year ago | (#44405687)

WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.

Re:Have these people never heard of IEEE754???? (5, Insightful)

cnettel (836611) | about a year ago | (#44405715)

No, it isn't, when the system itself is not well-conditioned. And I bet you don't want your compiler to run a real codebase in a IEEE754 strict interpretation, as that will disallow almost any optimization. Even if you would allow it, then "trivial" rearrangements, that don't affect the theoretical analysis of stability, correctness or condition number, will still introduce different rounding perturbations. Perturb weather or some other systems, and you will get a completely different trajectory.

That said, many applied fields, including meteorology, could benefit from more well-disciplined computational science approaches. But don't expect all that much of a difference.

Re:Have these people never heard of IEEE754???? (1, Insightful)

gweihir (88907) | about a year ago | (#44405763)

I was in particular thinking about the section on rounding in IEEE754. You are also overlooking that badly conditioned != behaves in a random fashion. My guess is they did not involve the numerics people in the optimization process, which is a complete fail when you know your problem is not well conditioned.

Re:Have these people never heard of IEEE754???? (5, Informative)

cnettel (836611) | about a year ago | (#44405789)

It doesn't help you that individual operations are rounded deterministically, if the order of your operations is non-deterministic. You cannot expect bit-identical results if you parallelize or allow any level of operation reordering. Even a very well-written code might implement a reduce operation in different hierarchies depending on memory layout. Enforcing all these things to be done in the exactly same order, with full IEEE754 compliance is a significant performance cost. By taking numerical aspects into account, you can ensure that your result is not invalid or unreasonable. However, for a chaotic problem where a machine epsilon difference in input data might be enough for a macroscopically different end result, there is nothing you can do and still expect reasonable utilization of modern architectures.

Re:Have these people never heard of IEEE754???? (1)

EvanED (569694) | about a year ago | (#44405853)

I wish I still had my mod points from a few days ago, because this post deserves some.

Re:Have these people never heard of IEEE754???? (2)

korgitser (1809018) | about a year ago | (#44406007)

So are you saying that enforcing predictable and correct answers has a significant performance cost?

Re:Have these people never heard of IEEE754???? (5, Informative)

Rockoon (1252108) | about a year ago | (#44406431)

So are you saying that enforcing predictable and correct answers has a significant performance cost?

He said nothing about "correct."

And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.

When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.

The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.

Re:Have these people never heard of IEEE754???? (5, Insightful)

Anonymous Coward | about a year ago | (#44406523)

Almost nothing you do with IEEE754 floating point numbers is correct in the strict mathematical sense. You can't even represent 0.1 (1/10) as an IEEE754 floating point number. There are entire series of lectures on the topic of scientific computing with floating point numbers. The errors are usually small enough that a few simple rules keep you safe (e.g., never compare floating point numbers for equality), but when you do many iterations, the errors can accumulate and mess with your results, and if in that case you do the calculations in a different order, the accumulated error will mess with your results in a different way. That's what's happening here.

Re:Have these people never heard of IEEE754???? (1)

gweihir (88907) | about a year ago | (#44406553)

I am nor arguing about that, I know that this is true. What gets me is that this is a surprise to anyone. I mean, have the done optimization without error estimation? Have they completely ignored error when optimizing? You do not just calculate away on these problems and then check whether the results seem to match reality. The results are far too important for that amateur-level approach.

Re:Have these people never heard of IEEE754???? (3, Informative)

SlayerofGods (682938) | about a year ago | (#44405749)

Yes... because that never rounds off numbers.
https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules [wikipedia.org]

Re:Have these people never heard of IEEE754???? (1)

nogginthenog (582552) | about a year ago | (#44406369)

But it does it in a consistant way across platforms.

Re:Have these people never heard of IEEE754???? (4, Insightful)

Goaway (82658) | about a year ago | (#44405767)

When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer. It is not exactly a problem if the results differ at that point.

Re:Have these people never heard of IEEE754???? (3, Insightful)

amorsen (7485) | about a year ago | (#44406205)

When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer.

This is not true. If the model predicts rain at 2 pm two days out and different rounding moves it to 3 pm, that is still a useful forecast in a lot of cases.

Re:Have these people never heard of IEEE754???? (0)

Goaway (82658) | about a year ago | (#44406215)

If rounding error moves the time from 2 pm to 3 pm, then the errors in your input data will probably switch it between raining at all, and sunshine. You are already past the point where your model can predict anything at all.

Re: Have these people never heard of IEEE754???? (0, Flamebait)

alen (225700) | about a year ago | (#44406605)

But what if one model predicts the end of the world due to higher temps and another one says the earth will absorb the heat

Which one do you trust?

Re:Have these people never heard of IEEE754???? (1)

Anonymous Coward | about a year ago | (#44405799)

That is the problem when people start compiling with things like --ffast-math.

Re:Have these people never heard of IEEE754???? (2, Interesting)

Anonymous Coward | about a year ago | (#44405893)

WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.

I ran into this once when working on support for an AIX compiler - got a bug report that we were doing floating point wrong because the code gave different results on AIX than some other machine (HP I think). After looking into it, it turned out that the algorithm accumulated roundoff errors quite badly, and basically wasn't working right on _any_ platform, but would give different results due to slightly different handling of round-off on the different platforms.

The problem is, this kind of code is very often written by scientists, who have most likely never heard of this issue, or forgot about it, or thought they handled it right but didn't - it's not their area of expertise, so it's not surprising if you think about it. I only hope that for engineering software that designs bridges, airplanes, etc, they realized that they better have it looked over by someone who knows what they are doing.

BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.

Re:Have these people never heard of IEEE754???? (1)

swilver (617741) | about a year ago | (#44405983)

They didn't predict the rain correctly yesterday here, that's why I believe those predictions are obviously incorrect.

Re:Have these people never heard of IEEE754???? (1)

lightknight (213164) | about a year ago | (#44406557)

Nice, but no. He's pointing out the obvious: Climate scientists are usually reliant on their own coding skills, which love it or hate it, are not quite on the same level (usually) as a Computer scientist / Software engineer.

And yes, little errors do matter, since a little error in a preceding calculation may be used in the next series of calculations, and so on...the snowball effect.

Re:Have these people never heard of IEEE754???? (5, Interesting)

kyrsjo (2420192) | about a year ago | (#44406655)

*SNIP*

BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.

In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.

When simulation chaotic behaviour, very small differences can make a big difference in the outcome of your simulations. As an example, I'm currently working on simulations of sparks in vacuum, which is a "runaway" process. In this case, adding a single particle early in the simulations (before the spark actually happens) can change the time for the spark to appear by several tens of %. This also happens if we are running with different library versions (SuperLU, Lapack), different compilers, and different compiler flags. Once the spark happens, the behaviour is predictable and repeatable - but the time for it to happen, as the system is "balancing on the edge, before falling over", is quite random.

Re:Have these people never heard of IEEE754???? (2)

amorsen (7485) | about a year ago | (#44406191)

WTF are these amateurs doing?

Enjoying decent performance. Doing weather forecasts slower than real time is a lot easier but somewhat less useful.

My interpretation of the abstract (I cannot access the actual paper) is that they could not show that any particular compiler or architecture made the predictions any better, just different. In that case you just go with whichever runs fastest.

Re:Have these people never heard of IEEE754???? (5, Informative)

Xtifr (1323) | about a year ago | (#44406261)

That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a good thing, since, once they start to diverge, you know you've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, you could figure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.

Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.

Re:Have these people never heard of IEEE754???? (1)

kyrsjo (2420192) | about a year ago | (#44406667)

Please mod parent up!

Re:Have these people never heard of IEEE754???? (0)

Anonymous Coward | about a year ago | (#44406963)

Indeed. If they don't like the problem, the solution is not to introduce more accurate rounding, but to do the calculations in greater precision, assuming that their models are accurate enough that the results would be useful.

This is what makes the fact that "--fast-math" isn't a default setting in GCC kind of silly. If you honestly care how your floating point numbers are rounded, then you're not using floating point correctly.

Re:Have these people never heard of IEEE754???? (1)

Chris Katko (2923353) | about a year ago | (#44406977)

But isn't the point of rounding-errors giving drastically different results mean it's ADDING error? As opposed to being able to see where the the results change based on THE DATA and THE ALGORITHMS, we're now supposed to be fitting to meaningless rounding error? That would be like saying I have 5 significant figure data (123.45), I use integer data types, and now I say results are only possible with three significant figures (123) because "it shows where the diverging results start."

Re:Have these people never heard of IEEE754???? (1)

Anonymous Coward | about a year ago | (#44406339)

WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.

Go read up on chaotic systems, then come back to us.

It is the butterfly effect. (4, Interesting)

140Mandak262Jamuna (970587) | about a year ago | (#44405711)

Almost all the CFD (Computational Fluid Mechanics) simulations us time marching of Navier-Stokes equations. Despite being very non linear and very hard, one great thing about them is they naturally parallelize very well. The partition the solution domain into many subdomains and distribute the finite volume mesh associated with each sub domain to a different node. Each mesh is also parallelized using GPU. At the end of the day these threads complete execution at slightly different times and post updates asynchronously. So even if you use the same OS and the same basic cluster, if you run it twice you get two different results if you run it far enough, like 10 days. I am totally not surprised if you change OS or architecture or big-endian-small-endian things or the math processor or the GPU brands the solutions differ a lot when you make 10 day forecast.

Re:It is the butterfly effect. (0)

CODiNE (27417) | about a year ago | (#44406119)

Damn. Keep Ashtin Kutcher AWAY from my computer!

Re:It is the butterfly effect. (-1)

Anonymous Coward | about a year ago | (#44406419)

Does this look even remotely like Reddit? No, so fuck off.

Re:It is the butterfly effect. (1)

bill_mcgonigle (4333) | about a year ago | (#44407025)

Coincidentally, I went to a presentation a couple weeks ago that largely focused on HPC CFD work. The presenter's company doesn't use GPU's because things like memory bandwidth are more important, but that aside, the thing that surprised me the most was that the simulations are not independently clocked (self-clocking) - they use the hardware clock, so things like latency and state are extremely important. Self-clocking would be too expensive with current hardware. Depending on the HPC cluster setup (and even things like BIOS versions matching on different nodes) the simulation clocks can drift and ruin the simulation. It's very exacting work in the current state of the art, and very easy to get wrong.

Of course, now the weatherman can blame the sysadmin...

Re:It is the butterfly effect. (1)

4wdloop (1031398) | about a year ago | (#44407119)

So hence CFD N-S equations are not (presently?) solvable, hence simulation is used to approximate the answer, would it mean, all imperfections of computations asides, that exact weather forecasting is not possible?

I suppose hence N-S equations do not have (yet?) mathematical proofs of solutions and smoothness, we cannot yet predict if precise weather forecasting is even theoretically possible?

I've seen this before (5, Interesting)

slashgordo. (2772763) | about a year ago | (#44405733)

When doing spice simulations of a circuit many years ago, we ran across one interesting feature. When using the exact same inputs and the exact same executable, the sim would converge and run on one machine, but it would fail to converge on another. It just happened that one of the machines was an Intel server, and the other was an AMD, and we attributed it to ever so slightly different round off errors between the floating point implementation of the two. It didn't help that we were trying to simulate a bad circuit design that was on the hairy edge of convergence, but it was eye opening that you could not guarantee 100% identical results between different hardware platforms.

Re:I've seen this before (4, Funny)

Livius (318358) | about a year ago | (#44405933)

Well, Arrakis melange is a pretty strong drug, so consistency in spice simulations is probably a little too much to expect.

(Yes, I know the parent really meant SPICE [wikipedia.org] .)

Re:I've seen this before (1)

mrbester (200927) | about a year ago | (#44406863)

Maybe he's South African and was typing up a dictated post

Re:I've seen this before (4, Funny)

rossdee (243626) | about a year ago | (#44405945)

"When doing spice simulations "

Weather forecasting on Arrakis is somewhat tricky, not only do you have the large storms, but also giant sndworms.
(And sabotage by the Fremen)

Re:I've seen this before (0)

Anonymous Coward | about a year ago | (#44406535)

And the use of nuclear weapons for landscaping.

Re:I've seen this before (1)

Anonymous Coward | about a year ago | (#44406035)

Yes, this is known:
Deep inside your CPU's pipelining circuits, there is a possibility of reordering the execution of apparently innocent operations, each conforming to IEEE754 standard. Reordering itself can trigger interesting ripples as regards error bounds across operations, versus time.

Especially, fellows have developed a slight distrust for the Intel compiler because it has a tendency to optimize so much in favor of speed, that these effects may be even more pronounced. So, the advice is: do a few alternative builds and compare runs' results, just to keep the most basic risks at bay. Alternatively, you may go at ensemble runs (weather & climate people often do), in order to decimate some common noise errors. Overall, be aware that computational science may not be as easy as it seems at first sight.

Re:I've seen this before (4, Insightful)

Cassini2 (956052) | about a year ago | (#44407015)

This often happens when the simulation results are influenced by variations in the accuracy of the built-in functions. Every floating point unit (FPU) returns an approximation of the correct result to an arbitrary level of accuracy, and the accuracy level of these results varies considerably when built-in functions like sqrt(), sin(), cos(), ln(), and exp() are considered. Normally, the accuracy of these results is pretty high. However, the initial 8087 FPU hardware from Intel was pretty old, and it necessarily made approximations.

At one point, Cyrix released an 80287 clone FPU that was faster and more accurate than Intel's 80287 equivalent. This broke many programs. Since then, Intel and AMD have been developing FPUs that are compatible with the 8087, ideally at least as accurate, and much faster. The GPU vendors have been doing something similar, however in video games, speed is more important than accuracy. For compatibility reasons (CPUs) and speed reasons (GPUs), vendors have focused on returning fast, compatible and reasonably accurate results.

In terms of accuracy, the results of the key transcendental functions, exponential functions, logarithmic functions, and the sqrt function should be viewed with suspicion. At high-accuracy levels, the least-significant bits of the results may vary considerably between processor generations, and CPU/GPU vendors. Additionally, slight differences in the results of double-precision floating point to 64-bit integer conversion functions can be detected, especially when 80-bit intermediate values are considered. Given these approximations, getting repeatable results for accuracy-sensitive simulations is tough.

It is likely that the articles weather simulations and the parent poster's simulations have differing results due to the approximations in built-in functions. Inaccuracies in the built-in functions are often much more significant that the differences due to round-off errors.

double versus long double (2)

Barbarian (9467) | about a year ago | (#44405759)

The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?

Re:double versus long double (0)

Anonymous Coward | about a year ago | (#44406389)

Not just architecture, but OS. Linux x86 32-bit defaults to 80-bit FP. x86_64 defaults to 64-bit FP.

Some google searches show the 32-bit x86 *BSD default to 64-bit FP.

Re:double versus long double (1)

sstamps (39313) | about a year ago | (#44406607)

1) There never was any such thing as an 8081.

2) The earliest Intel math coprocessor was the 8087, for the 8086. The 80-bit float was a special temporary-precision representation which could be stored in memory, but was otherwise unique to the Intel MCP architecture.

Re:double versus long double (1)

Anonymous Coward | about a year ago | (#44406751)

2) The earliest Intel math coprocessor was the 8087, for the 8086. The 80-bit float was a special temporary-precision representation which could be stored in memory, but was otherwise unique to the Intel MCP architecture.

Other FP implementations have 80-bit as well.

http://en.wikipedia.org/wiki/Motorola_68881 [wikipedia.org]

(Maybe) unique to x87 is the stack architecture.

Time to revoke some "scientist" licences. (0)

Anonymous Coward | about a year ago | (#44405777)

The people writing this code ought to've known better.

Re:Time to revoke some "scientist" licences. (0)

Anonymous Coward | about a year ago | (#44405817)

Who said they didn't know better?
"which is the change of the standard deviation relative to the value itself, remains nearly zero with time."
Sounds to me like the 'problem' takes care of itself.

Re:Time to revoke some "scientist" licences. (0)

Anonymous Coward | about a year ago | (#44405919)

That reads to me like the system estimates its own calculations to be pretty accurate, even though rounding has clearly introduced a large amount of uncertainty to the result. But TFA is paywalled, and I can't find any other significant uses of the term "fractional tendency" on Google, so who knows what they mean.

Chaos (5, Interesting)

pcjunky (517872) | about a year ago | (#44405781)

This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.

Re:Chaos (1)

Trepidity (597) | about a year ago | (#44407009)

Was noted in actual weather systems as well (at least as far as we understand them), which is part of what makes it particularly tricky to avoid in simulations. It's not only that our hurricane track models, for example, are sensitively dependent on parameters, but also that real hurricane trajectories appear to be sensitively dependent on surrounding conditions.

Doesn't matter much (0)

Hentes (2461350) | about a year ago | (#44405827)

Rounding errors are orders of magnitude smaller than measurement errors, they are not the precision bottleneck.

Re:Doesn't matter much (0)

Anonymous Coward | about a year ago | (#44405905)

But thats the problem. Those tiny rounding errors are causing different forecasts. That means a difference of input by 0.0001% will give a completely different output.

How accurate and reliable can these forecasts be in reality then? Just their measurement devices being off by more than the can possibly be calibrated can change your 10 day forecast. Sounds like the entire thing is junk.

Side note: Anyone who has programed with money and had to deal with half pennies and get it correct 100% of the time knows the tricks to deal with this. Bank interest calculations, done correctly, will not come out different on different hardware.

Re:Doesn't matter much (1)

Goaway (82658) | about a year ago | (#44405955)

Those tiny rounding errors are causing different forecasts.

So are the measurement errors, and to a much higher degree. The roundoff errors just don't matter.

How accurate and reliable can these forecasts be in reality then?

Once they reach the point where errors have accumulated to this degree, not at all. Everybody knows that.

Re:Doesn't matter much (1)

meza (414214) | about a year ago | (#44406039)

At first I agreed with you and thought the GP wasn't aware of the concept of chaos (small errors in input give large errors in output). However, that's not what he wrote. He correctly pointed out that the rounding error is much smaller than the error from the initial measurement. Logically it should be the dominant error that first leads to chaotic behavior. The problem then seems to be over-belief in the forecast due to not accounting correctly for the measurement error. Long before any rounding errors start to play a role one should have stopped the simulation as it didn't predict anything useful anyway.

Re:Doesn't matter much (1)

CrimsonAvenger (580665) | about a year ago | (#44406225)

However, that's not what he wrote. He correctly pointed out that the rounding error is much smaller than the error from the initial measurement. Logically it should be the dominant error that first leads to chaotic behavior.

Alas, TFA is about a situation where they take the SAME inputs (initial measurements), run the program on ten different sets of hardware, and get ten different results.

I fail to see how the same program + same inputs == "differences in inputs cause most of the error"....

Re:Doesn't matter much (1)

kasperd (592156) | about a year ago | (#44406843)

I fail to see how the same program + same inputs == "differences in inputs cause most of the error"....

Inaccuracies in the input most likely did cause most of the error. Maybe nobody noticed because that error was the same in all the calculations. Eventually a difference between the calculations starting to build up because of differences in rounding between the different runs. This variation was noticed, but it would still be small compared to the differences caused by inaccuracies in the input. In short means by the time you notice the difference between two runs, both of them are already way off compared to the real value due to both of them having been working on the same inaccurate input.

If you want to do better, then do calculations with a representation that keeps track of uncertainty. Even in those cases where you cannot do a floating point operation and get an exact result, you can still do the calculation and know the possible range of the error. So each number is represented by a minimum and a maximum (or a mean and an error margin). As you do calculations the minimum and maximum values will be going further and further apart. Once they get too far apart, you know the results are no longer useful.

When you start the simulation, you initialize the numbers with an error margin corresponding to the accuracy of the measurements. Different runs on different platforms may not build up errors at the same rate, and that is something you can actually look at. If the ranges from two different runs no longer overlaps, then you know there is a bug somewhere. If one simulation says the air temperature is going to be be between -10 and +30 and the other simulation says it is going to be between 0 and +20, then they can both be right, but neither simulation result is particular useful. If one simulation says it is going to be between -10 and 0 and the other says it is going to be between +20 and +30, then you know at least one of them has a bug.

we code it to drop the parts of pennies into our (0)

Anonymous Coward | about a year ago | (#44406487)

our bank account

Re:Doesn't matter much (0)

Anonymous Coward | about a year ago | (#44406059)

yain, in that order.

Rounding errors may be small, but if their effect is not understood and contained, there will be not much hope for reproducible science
(specifically, it would not help tuning of the models themselves for fixable non implementation specific flaws).

Re:Doesn't matter much (4, Informative)

AchilleTalon (540925) | about a year ago | (#44406065)

Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations. So, even if a single precision error is magnitude orders smaller than measurement errors, they can have an impact on the result depending on the computations involved while solving the problem.

Yes, the Butterfly Effect, as others have said (5, Interesting)

Impy the Impiuos Imp (442658) | about a year ago | (#44405883)

This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.

With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.

Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.

This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.

To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.

Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)

Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.

Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)

Been there, done that (1)

fatmar (992498) | about a year ago | (#44405885)

This is a good time to review some problems in met codes. The first real problem is that the science is poorly understood. If the model is poorly constructed conditioning is one of the least of your problems. By and large, the push for V&V came form the met world. The second thing is that the spatial resolution is 'way too big. And, long before IEEE 754, it was anecdotally known that you lose a digit whenever you change systems (hardware or software).

Headline disagrees with summary. (0)

Anonymous Coward | about a year ago | (#44405895)

The summary says, "There exist differences in the results for different compilers, parallel libraries, and optimization levels,". That doesn't mean different computers, although different computers were used. It means that they weren't running the same code path and same order of operations so differences should have been expected.

Unfortunately, any information regarding whether the differences are significant for local or even regional weather prediction is behind the paywall.

Global Climate Change (0, Troll)

Anonymous Coward | about a year ago | (#44405959)

Certainly not a problem for climate "scientists" all over the planet and their crazy predictions out 10 or 20 years.

Re:Global Climate Change (0)

Anonymous Coward | about a year ago | (#44406639)

Glad to see this has been marked -1.

It's very important to stop anyone thinking that there could be anything wrong with the Global Warming scare.

97% of all scientists can't be wrong!!!

Translation (1)

Chris Mattern (191822) | about a year ago | (#44405979)

The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time.

In other words, they all gave different answers, but each one was equally certain that *it* was right.

Re:Translation (1)

Anne Thwacks (531696) | about a year ago | (#44406317)

they all gave different answers, but each one was equally certain that *it* was right.

Perhaps that is where politicians got the idea from?

Just needs a little adjustment (2, Funny)

Anonymous Coward | about a year ago | (#44406017)

They really need to standardize on what butterflies to use.

Global Warming Predictions (-1)

Anonymous Coward | about a year ago | (#44406055)

So, predictions of global warming and increasing weather variability are really justt artifacts of round-off errors.

Re:Global Warming Predictions (0)

Anonymous Coward | about a year ago | (#44406073)

So, predictions of global warming and increasing weather variability are really justt artifacts of round-off errors.

My Slashdot tagline for today:

1 + 1 = 3, for large values of 1.

Re:Global Warming Predictions (-1)

Anonymous Coward | about a year ago | (#44406777)

Actually, it's worse.

many predictions - most, nowadays, are out-and-out fraud.

But you mustn't say this.....

Building reproducible HPC software is already hard (0)

Anonymous Coward | about a year ago | (#44406093)

If you don't believe that statement, look at this:
http://hpcugent.github.io/easybuild/
and then the diagram in here:
http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-92.html

Put the equivalent of that diagram into scattered wiki instructions and ask any 2 people to come up with the same build;
how tough would that be? Only the people who have tried it, know really well what it means...

btw.
In modern HPC systems it is common to provide 3 MPI stacks (IntelMPI, OpenMPI, MVAPICH) and a bunch of compilers;
ah, and that's just the first two layers from the bottom of that diagram! Are you surprised you have fireworks on the top?

Re:Building reproducible HPC software is already h (0)

Anonymous Coward | about a year ago | (#44406167)

btw. "Consistency of Floating-Point Results using the Intel® Compiler or Why doesn’t my application always give the same answer?"
ref. http://software.intel.com/sites/default/files/m/4/4/6/9/4/39386-FP_Consistency_102511.pdf

molecular dynamics on GPU (0)

inflamed (1156277) | about a year ago | (#44406121)

In molecular dynamics simulations, kinetics are known to be approximate and states at a given time are not considered directly correlated with that time point; we only hope to get statistically correct distribution of states across ensembles. Consequently, differences in rounding between wildly different compiler/hardware architectures are expected. However, deterministic behavior of the system is achieved by employing higher precisions for accumulation steps, which ensures that averages over a sufficiently long time (big enough sample) are the same no matter what hardware is employed. Consequently a tremendous speed-up is possible running CUDA code on consumer grade nvidia cards which have far fewer double precision execution units than single float precision units. So, we have deterministic trajectories but nobody expects these to match real-world processes on a time-function basis :-)

Comical (0)

Anonymous Coward | about a year ago | (#44406165)

Beyond about 3 days (based on the Meteorology classes I took in college) most forecasts are just a shitty guess. Looking at a 10 days forecast is like calling your local psychic hotline. Sometimes they're right, but it's just a lucky guess.

Hey, at it least it ran all the way. (3, Interesting)

140Mandak262Jamuna (970587) | about a year ago | (#44406201)

These numerical simulation codes can sometimes do things funny things when you port from one architecture to another. One of the most frustrating debugging session I had was when I ported my code to Linux. One of my tree class's comparison operator evaluates the key and compares the calculated key with the value stored in the instance. It was crapping out in Linux and not in Windows. I eventually discovered Linux was using 80 bit registers for floating point computation but the stored value in the instance was truncated to 64 bits.

Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.

CompSci 101 (0)

Anonymous Coward | about a year ago | (#44406257)

I'm no programming expert, but isn't this basically Computer Science 101 stuff?

Re:CompSci 101 (2)

kasperd (592156) | about a year ago | (#44406945)

I'm no programming expert, but isn't this basically Computer Science 101 stuff?

All I was taught about floating point at that level was how wrong results we could get, and that we should avoid it. Several years later on a more advanced course, I learned about how to do floating point calculations, if you really need to.

Don't let mathematicians write production code (0)

AlejoHausner (1047558) | about a year ago | (#44406265)

I once saw a piece of code written by a mathematician which said "pow(x, -1)". Ugh. I wonder if meteorologists know better.

Re:Don't let mathematicians write production code (1)

Anonymous Coward | about a year ago | (#44406481)

I once saw a piece of code written by a mathematician which said "pow(x, -1)". Ugh. I wonder if meteorologists know better.

It might be written that way to get a well-defined behavior depending upon the value of x. E.g. what if x is +0?

http://linux.die.net/man/3/pow [die.net]

Maybe they do know better.

This is a DENIER propaganda story! (0)

Anonymous Coward | about a year ago | (#44406313)

It doesn't mean anything. You must not listen to it. Global Warming is still happening, and the models are all correct and agree with each other.

97% of all scientists agree that we should stop generating CO2 NOW, otherwise humanity will be responsible for the greatest environmental catastrophe ever to hit the Earth. There is no need for any further examination of the science - what we need is ACTION.

Slashdot should not be supporting denier propaganda in this way. This story should be removed immediately.

Lies, damned lies, and statistics... (0)

Anonymous Coward | about a year ago | (#44406315)

QED - quod erat demonstrandum! Or to paraphrase - the proof is in the pudding... :-)

problem solved decades ago (1, Interesting)

Gravis Zero (934156) | about a year ago | (#44406331)

it's called Binary Coded Decimal (BCD) [wikipedia.org] and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.

so what is Pi in BCD? (0)

Anonymous Coward | about a year ago | (#44406393)

oh wait, you cannot compute Pi on any machine because its transcendental

Unfortunately not... (0)

Anonymous Coward | about a year ago | (#44406525)

BCD solves the problem of binary not being able to unambigously represent certain decimal fractions. BCD solves little for scientific computing. You still need to round and in parallel programs you still gather & round in non-deterministic order. The non-determinism of the particular program won't go away if rewritten to use BCD.

Re:problem solved decades ago (3, Informative)

HornWumpus (783565) | about a year ago | (#44406569)

A little knowledge is a dangerous thing.

Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.

Re:problem solved decades ago (0)

Anonymous Coward | about a year ago | (#44406947)

Nope. If you are suggesting base 10 (ten) would help, thats silly. There is nothing special about ten.

Now if you are suggesting fixed point would help, thats also wrong. There is still rounding there.

Thirdly, if you are suggesting using BCD to implement arbitrary precision to avoid all rounding: that is impossible due to repeating decimals (1/3 for example), and if you are doing that, there is no reason to use BCD! Base 2 is fine for extending to arbitrary precision.

So I can't see how BCD could possibly help. Its one purpose is when you want to represent numbers in base 10 so that you don't accumulate errors from base conversions of numbers that are displayed to humans in base 10. The weather simulator does not convert bases, so that's irrelevant.

Lastly, BCD isn't really an alternative to doubles. BCD is a way to represent base 10 digits. You would have to specify more detail as to how it would be used to represent non whole numbers (but as mentioned above, using it for fixed point does not help, nor does using it for the significant or exponent in floats). And don't even propose rationals: that has nothing to do with BCD.

Re:problem solved decades ago (2)

wiredlogic (135348) | about a year ago | (#44407137)

BCD is no better than fixed point binary in this instance. The banking industry relies on it because we use decimalized currency and it eliminates some types of errors to carry out all computations in decimal. For simulation inputs you're no better off than if you use a plain binary encoded number.

INB4 Climate Model Shitstorm (0)

PPH (736903) | about a year ago | (#44406397)

n/a

Re:INB4 Climate Model Shitstorm (0)

Anonymous Coward | about a year ago | (#44406489)

paper title already appears on at least 2 global-warming-skeptic blogs already

Welcome to Chaotic Systems 101 ;-) (2)

Technomancer (51963) | about a year ago | (#44406517)

Pretty much most iterative simulation systems like weather simulation will behave this way. When the result of one step of the simulation is the input for another step any rounding error will possibly get amplified.
Also see Butterfly Effect https://en.wikipedia.org/wiki/Butterfly_effect (not the movie!).

Utterly Unsurprising (2, Insightful)

Anonymous Coward | about a year ago | (#44406573)

Floating Point arithmetic is not associative.

Everyone who reads Stack Overflow knows this, because every who doesn't know this posts to Stack Overflow asking why they get weird results.

Everyone who does numerical simulation or scientific programming work knows this because they've torn their hair out at least once wondering if they have a subtle bug or if it's just round-off error.

Everyone who does cross-platform work knows this because different platforms implement compilers (and IEEE-754) in slightly different ways.

Everyone who does parallel programming knows this because holy shit will you see round-off differences when you through many minutes of TFlops at a problem and it sequences difference every time.

Everyone who looks at the standards knows this because for Chrissakes, Fused-Multiply-Add is standards compliant but _optional_.

Why is this remotely news?

Lorenz, the Butterfly Effect and Chaos Theory (3, Informative)

alanw (1822) | about a year ago | (#44406577)

Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"

http://www.ganssle.com/articles/achaos.htm [ganssle.com]

The butterfly that changed the weather for the wor (0)

Anonymous Coward | about a year ago | (#44406619)

The butterfly that changed the weather for the world was not in Texas. But it rather at the end of a floating point word with.

This is why regression testing matters (0)

Anonymous Coward | about a year ago | (#44406741)

This is a common problem with all serious scientific codes. If it's important, you rerun test cases any time you change compiler flags or system software and compare results to make sure the changes are within an acceptable tolerance. They're never the same, so if the change is larger than the threshold, human examination and judgement are required to determine if the change is acceptable. It's not uncommon to discover latent bugs that didn't appear until the machine actually did what the programmer wrote.

Anyone who thinks that this is a solved problem, or ever will be a solved problem doesn't understand the many issues involved which range from algorithm choice to order of execution and intermediate result truncation.

FWIW So far as I know, no x86 systems provide 128 bit floating point. Power, Sparc and Z series are the only options for that I'm aware of. I spent a good bit of time investigating this when I code of mine had convergence issues.

It also demonstrates the problem with modeling climate change. But of course, if you already know what the answer you want is, you can just modify things until you get the answer you want.

So... (0)

Anonymous Coward | about a year ago | (#44406851)

So... of all the hardware tested, which one more accurately predicted the actual weather?
That's what I want to know!

Sh*t For Brains (0)

Anonymous Coward | about a year ago | (#44406859)

Endianness, floating-point representation, long-short INT, roundoff, machine error have been known from the 1970's as posted above.

Trouble now is that a 'new' kid is in town: the Geographer (Geo-groper).

Geographer + computer + stolen (borrowed) code (the Geographer does not even understand INT32) = Shitty output.

The UN IPCC 'Reports' are ripe with shit that they call 'science.'

And the Politicos and Warmer-boys just eat it up faster than it dribbles out.

But who is to complain? The National Science Foundation (lots of Geo-gropers) throw money at this shit like the Treasury can't print enough money fast enough.

Like I wrote: SFB.

Splitting atoms, wx forecasts & zen (0)

Anonymous Coward | about a year ago | (#44406941)

Excuse my completely uneducated, non-scientific response but, this is, in essence about weather forecasting, right? It would seem to a laymen like myself you are a group of highly trained scientists of different genres looking to be as accurate as possible. There is one variable that I'm sure none of you wants to admit. I highly respect, appreciate, & admire what you do for the common good. With all the super computers, past data, software and modeling, there's a fly in the soup. You guys, at the end of the day, still have to have a little luck to correct! Mother Nature has no part in anything to do with science. Chaos is by definition impossible to predict! It's just a thought I wanted to throw out there. Anyone who hunts or fishes passionately knows what I'm eluding to. Everything is in "perfect" condition for the hunt & the game is nowhere to be found. I'm not being critical, but you can't be 100% accurate with anything to do with nature. Thanks for all you do!

doubles (0)

Anonymous Coward | about a year ago | (#44407041)

just use doubles.
may be slightly slower, but you wont have this problem/

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?