roughly 4 hours ago

I like this!

In the grand HN tradition of being triggered by a word in the post and going off on a not-quite-but-basically-totally-tangential rant:

There’s (at least) three areas here that are footguns with these kinds of calculations:

1) 95% is usually a lot wider than people think - people take 95% as “I’m pretty sure it’s this,” whereas it’s really closer to “it’d be really surprising if it were not this” - by and large people keep their mental error bars too close.

2) probability is rarely truly uncorrelated - call this the “Mortgage Derivatives” maxim. In the family example, rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high. This skews the distribution - modeling with an unweighted uniform distribution will lead to you being surprised at how improbable the actual outcome was.

3) In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce. We see them a bunch in nature because there tends to be negative feedback loops all over the place, but once you leave the relatively tidy garden of Mother Nature for the chaos of human affairs, normal distributions get pretty abnormal.

I like this as a tool, and I like the implementation, I’ve just seen a lot of people pick up statistics for the first time and lose a finger.

  • btilly 3 hours ago

    I strongly agree with this, and particularly point 1. If you ask people to provide estimated ranges for answers that they are 90% confident in, people on average produce roughly 30% confidence intervals instead. Over 90% of people don't even get to 70% confidence intervals.

    You can test yourself at https://blog.codinghorror.com/how-good-an-estimator-are-you/.

    • Nevermark an hour ago

      From link:

      > Heaviest blue whale ever recorded

      I don't think estimation errors regarding things outside of someone's area of familiarity say much.

      You could ask a much "easier"" question from the same topic area and still get terrible answers: "What percentage of blue whales are blue?" Or just "Are blue whales blue?"

      Estimating something often encountered but uncounted seems like a better test. Like how many cars pass in front of my house every day. I could apply arithmetic, soft logic and intuition to that. But that would be a difficult question to grade, given it has no universal answer.

      • yen223 28 minutes ago

        I guess people didn't realise they are allowed to, and in fact are expected to, put very wide ranges for things they are not certain about.

  • pertdist an hour ago

    I did a project with non-technical stakeholders modeling likely completion dates for a big GANTT chart. Business stakeholders wanted probabilistic task completion times because some of the tasks were new and impractical to quantify with fixed times.

    Stakeholders really liked specifying work times as t_i ~ PERT(min, mode, max) because it mimics their thinking and handles typical real-world asymmetrical distributions.

    [Background: PERT is just a re-parameterized beta distribution that's more user-friendly and intuitive https://rpubs.com/Kraj86186/985700]

  • youainti 3 hours ago

    > I’ve just seen a lot of people pick up statistics for the first time and lose a finger.

    I love this. I've never though of statistics like a power tool or firearm, but the analogy fits really well.

NunoSempere 4 hours ago

I have written similar tools

- for command line, fermi: https://git.nunosempere.com/NunoSempere/fermi

- for android, a distribution calculator: https://f-droid.org/en/packages/com.nunosempere.distribution...

People might also be interested in https://www.squiggle-language.com/, which is a more complex version (or possibly <https://git.nunosempere.com/personal/squiggle.c>, which is a faster but much more verbose version in C)

  • NunoSempere 4 hours ago

    Fermi in particular has the following syntax

    ```

    5M 12M # number of people living in Chicago

    beta 1 200 # fraction of people that have a piano

    30 180 # minutes it takes to tune a piano, including travel time

    / 48 52 # weeks a year that piano tuners work for

    / 5 6 # days a week in which piano tuners work

    / 6 8 # hours a day in which piano tuners work

    / 60 # minutes to an hour

    ```

    multiplication is implied as the default operation, fits are lognormal.

    • NunoSempere 4 hours ago

      Here is a thread with some fun fermi estimates made with that tool: e.g., number of calories NK gets from Russia: https://x.com/NunoSempere/status/1857135650404966456

      900K 1.5M # tonnes of rice per year NK gets from Russia

      * 1K # kg in a tone

      * 1.2K 1.4K # calories per kg of rice

      / 1.9K 2.5K # daily caloric intake

      / 25M 28M # population of NK

      / 365 # years of food this buys

      / 1% # as a percentage

  • notpushkin an hour ago

    Would be a nice touch if Squiggle supported the `a~b` syntax :^)

  • antman 3 hours ago

    I tried the unsure calc and the android app and they seem to produce different results?

    • NunoSempere 3 hours ago

      The android app fits lognormals, and 90% rather than 95% confidence intervals. I think they are a more parsimonious distribution for doing these kinds of estimates. One hint might be that, per the central limit theorem, sums of independent variables will tend to normals, which means that products will tend to be lognormals, and for the decompositions quick estimates are most useful, multiplications are more common

ttoinou 5 hours ago

Would be nice to retransform the output into an interval / gaussian distribution

   Note: If you're curious why there is a negative number (-5) in the histogram, that's just an inevitable downside of the simplicity of the Unsure Calculator. Without further knowledge, the calculator cannot know that a negative number is impossible
Drake Equation or equation multiplying probabilities can also be seen in log space, where the uncertainty is on the scale of each probability, and the final probability is the product of exponential of the log probabilities. And we wouldnt have this negative issue
  • hatthew 4 hours ago

    The default example `100 / 4~6` gives the output `17~25`

gregschlom 4 hours ago

The ASCII art (well technically ANSI art) histogram is neat. Cool hack to get something done quickly. I'd have spent 5x the time trying various chart libraries and giving up.

  • Retr0id 4 hours ago

    On a similar note, I like the crude hand-drawn illustrations a lot. Fits the "napkin" theme.

OisinMoran 3 hours ago

This is neat! If you enjoy the write up, you might be interested in the paper “Dissolving the Fermi Paradox” which goes even more on-depth into actually multiplying the probability density functions instead of the common point estimates. It has the somewhat surprising result that we may just be alone.

https://arxiv.org/abs/1806.02404

  • drewvlaz 2 hours ago

    This was quite a fun read, thanks!

krick 5 hours ago

It sounds like a gimmick at first, but looks surprisingly useful. I'd surely install it if it was available as an app to use alongside my usual calculator, and while I cannot quite recall a situation when I needed it, it seems very plausible that I'll start finding use cases once I have it bound to some hotkey on my keyboard.

thih9 8 hours ago

Feature request: allow specifying the probability distribution. E.g.: ‘~’: normal, ‘_’: uniform, etc.

  • pyfon 2 hours ago

    Not having this feature is a feature—they mention this.

omoikane 4 hours ago

If I am reading this right, a range is expressed as a distance between the minimum and maximum values, and in the Monte Carlo part a number is generated from a uniform distribution within that range[1].

But if I just ask the calculator "1~2" (i.e. just a range without any operators), the histogram shows what looks like a normal distribution centered around 1.5[2].

Shouldn't the histogram be flat if the distribution is uniform?

[1] https://github.com/filiph/unsure/blob/123712482b7053974cbef9...

[2] https://filiph.github.io/unsure/#f=1~2

  • hatthew 4 hours ago

    Under the "Limitations" section:

    > Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.

Aachen 2 hours ago

https://qalculate.github.io can do this also for as long as I've used it (only a couple years to be fair). I've got it on my phone, my laptop, even my server with apt install qalc. Super convenient, supports everything from unit conversion to uncertainty tracking

The histogram is neat, I don't think qalc has that. On the other hand, it took 8 seconds to calculate the default (exceedingly trivial) example. Is that JavaScript, or is the server currently very busy?

lorenzowood 22 minutes ago

See also Guesstimate https://getguesstimate.com. Strengths include treating label and data as a unit, a space for examining the reasoning for a result, and the ability to replace an estimated distribution with sample data => you can build a model and then refine it over time. I'm amazed Excel and Google Sheets still haven't incorporated these things, years later.

djoldman 8 hours ago

I perused the codebase but I'm unfamiliar with dart:

https://github.com/filiph/unsure/blob/master/lib/src/calcula...

I assume this is a montecarlo approach? (Not to start a flamewar, at least for us data scientists :) ).

  • kccqzy 8 hours ago

    Yes it is.

    • porridgeraisin 7 hours ago

      Can you explain how? I'm an (aspiring)

      • kccqzy 7 hours ago

        I didn't peruse the source code. I just read the linked article in its entirety and it says

        > The computation is quite slow. In order to stay as flexible as possible, I'm using the Monte Carlo method. Which means the calculator is running about 250K AST-based computations for every calculation you put forth.

        So therefore I conclude Monte Carlo is being used.

      • constantcrying 5 hours ago

        Line 19 to 21 should be the Monte-Carlo sampling algorithm. The implementation is maybe a bit unintuitive but apparently he creates a function from the expression in the calculator, calling that function gives a random value from that function.

      • hawthorns 4 hours ago

        It's dead simple. Here is the simplified version that returns the quantiles for '100 / 2 ~ 4'.

          import numpy as np
          
          def monte_carlo(formula, iterations=100000):
            res = [formula() for _ in range(iterations)]
            return np.percentile(res, [0, 2.5, \*range(10, 100, 10), 
            97.5, 100])
        
          def uncertain_division():
            return 100 / np.random.uniform(2, 4)
        
          monte_carlo(uncertain_division, iterations=100000)
marcodiego 4 hours ago

I put "1 / (-1~1)" and expected something around - to + infinty. It instead gave me -35~35.

I really don't known how good it is.

  • NunoSempere 3 hours ago

    I'm guessing this is not an error. If you divide 1/normal(0,1), the full distribution would range from -inf to inf, but the 95% output doesn't have to.

    • SamBam 2 hours ago

      I don't quite understand, probably because my math isn't good enough.

      If you're treating -1~1 as a normal distribution, then it's centered on 0. If you're working out the answer using a Monte Carlo simulation, then you're going to be testing out different values from that distribution, right? And aren't you going to be more likely to test values closer to 0? So surely the most likely outputs should be far from 0, right?

      When I look at the histogram it creates, it varies by run, but the most common output seems generally closest to zero (and sometimes is exactly zero). Wouldn't that mean that it's most frequently picking values closest to -1 or 1 denoninator?

      • pyfon 2 hours ago

        Only 1 percent of values would end up being 100+ on a uniform distribution.

        For normal it is higher but maybe not much more so.

        • lswainemoore 11 minutes ago

          That may be true, but if you look at the distribution it puts out for this, it definitely smells funny. It looks like a very steep normal distribution, centered at 0 (ish). Seems like it should have two peaks? But maybe those are just getting compressed into one because of resolution of buckets?

vortico 3 hours ago

Cool! Some random requests to consider: Could the range x~y be uniform instead of 2 std dev normal (95.4%ile)? Sometimes the range of quantities is known. 95%ile is probably fine as a default though. Also, could a symbolic JS package be used instead of Monte-Carlo? This would improve speed and precision, especially for many variables (high dimensions). Could the result be shown in a line plot instead of ASCII bar chart?

nritchie 3 hours ago

Here (https://uncertainty.nist.gov/) is another similar Monte Carlo-style calculator designed by the statisticians at NIST. It is intended for propagating uncertainties in measurements and can handle various different assumed input distributions.

constantcrying 6 hours ago

An alternative approach is using fuzzy-numbers. If evaluated with interval arithmetic you can do very long calculations involving uncertain numbers very fast and with strong mathematical guarantees.

It would especially outperform the Monte-Carlo approach drastically.

  • sixo 4 hours ago

    This assumes the inputs are uniform distributions, or perhaps normals depending on what exactly fuzzy numbers mean. M-C is not so limited.

    • constantcrying 4 hours ago

      No. It assumes the numbers aren't random at all.

      Although fuzzy-number can be used to model many different kinds of uncertainties.

ashu1461 2 hours ago

So is it 250k calculations for every approximation window ? So i guess it will only be able to calculate upto 3-4 approximations comfortably ?

Any reason why we kept it 250k and now a lower number like 10k

timothylaurent 8 hours ago

This reminds me of https://www.getguesstimate.com/ , a probabilistic spreadsheet.

alex-moon 4 hours ago

> The UI is ugly, to say the least.

I actually quite like it. Really clean, easy to see all the important elements. Lovely clear legible monospace serif font.

alexmolas 6 hours ago

is this the same as error propagation? I used to do a lot of that during my physics degree

  • constantcrying 5 hours ago

    It doesn't propagate uncertainty through the computation, but rather treats the expression as a single random variable.

throwanem 7 hours ago

I love this! As a tool for helping folks with a good base in arithmetic develop statistical intuition, I can't think offhand of what I've seen that's better.

rao-v 8 hours ago

This is terrific and it’s tempting to turn into a little python package. +1 for notation to say it’s ~20,2 to mean 18~22

croisillon 6 hours ago

i like it and i skimmed the post but i don't understand why the default example 100 / 4~6 has a median of 20? there is no way of knowing why the range is between 4 and 6

  • constantcrying 5 hours ago

    The chance of 4~6 being less than 5 is 50%, the chance of it being greater is also 50%. The median of 100/4~6 has to be 100/5.

    >there is no way of knowing why the range is between 4 and 6

    ??? There is. It is the ~ symbol.

vessenes 6 hours ago

cool! are all ranges considered poisson distributions?

  • re 5 hours ago

    No:

    > Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.

chris_wot 4 hours ago

There's an amazing scene in "This is Spinal Tap" where Nigel Tufnel had been brainstorming a scene where Stonehenge would be lowered from above onto the stage during their performance, and he does some back of the envelope calculations which he gives to the set designer. Unfortunately, he mixes the symbol for feet with the symbol for inches. Leading to the following:

https://www.youtube.com/watch?v=Pyh1Va_mYWI

rogueptr 16 hours ago

brilliant work, polished ui. although sometimes give wrong ranges for equations like 100/1~(200~2000)

  • thih9 8 hours ago

    Can you elaborate? What is the answer you’re getting and what answer would you expect?

  • BrandoElFollito 8 hours ago

    How do you process this equation ? 100 divided by something from one to ...?

    • notfed 5 hours ago

      > 100 / 4~6

      Means "100 divided by some number between 4 and 6"

      • throwanem 4 hours ago

        "...some number with a 95% probability of falling between 4.0 and 6.0 inclusive," I believe.