Are measurements biased? OS scheduler and GC pauses tend to increase benchmark running time make distribution of run times asymmetric with tail at large t. This could easily lead to biased results.

Are error estimates reliable? Criterion could under- or overestimate errors. They are overlooked most of the time but still worth checking.

How precise are measurements. It’s easy to see 200% difference and be sure that it exists. But what about 20%? Or even 5%? What is measurable difference?

And of course there’s something else. There’s always something unexpected.

In this post I’ll try to check whether error estimates are reliable. This is very simple test. Measure same quantity many times and check what part of measurement fall into confidence interval. If too small fraction of event falls in error are underestimated and overestimated otherwise. All measurements were done with GHC 7.6.1 and criterion-0.6.2.1.

Here is program I used. It benchmarks `exp`

function 1000 times. `exp`

is cheap so here we probing regime of cheap functions which take many iterations to measure.

```
import Criterion.Main
main :: IO ()
main = defaultMain $ replicate 1000 $ bench "exp" $ nf exp (1 :: Double)
```

Benchmark was run on 6-core AMD Phenom-II with active X session. First thing to plot it run time vs number of benchmark. Here it is:

Now we have problem. Run time linearly depends on the number of benchmark or on the time. This test doesn’t discriminate between them. There is little doubt in existence of the effect but let repeat measurements in cleaner conditions. Second run was performed on completely idle core i7.

No surprise time dependence didn’t disappear but we got peaks at regular intervals. Probably those are GC related. They were completely masked by random jitter in previous measurement. We can force GC before every benchmark with `-g`

switch.

It didn’t help. Spikes didn’t go away only become more frequent. So there’s some strange effect which increases benchmark run time with every iteration. Cause of such weird behavior should be found.

**P.S.** Now if we return to the first plot we can see that distribution of run times do have tail at big *t* so we should expect that proper confidence interval would be quite asymmetric. This tail also limits attainable precision.

Jumping to import list to add/remove/tweak something is very frequent activity. Finding import is not that difficult they are always in the beginning of the file. Real problem is find place on which you worked before. Functions `haskell-navigate-imports`

and `haskell-navigate-imports-return`

fix that problem. First cycle though import list and second return you to place where you were before.

They are not binded to any key so one have to add them using hook. here is my choice of bindings:

```
(require 'haskell-navigate-imports)
(add-hook-list 'haskell-mode-hook (lambda () (
(local-set-key (kbd "M-[") 'haskell-navigate-imports)
(local-set-key (kbd "M-]") 'haskell-navigate-imports-return)))
```

Another very useful feature is quick adding/removing SCC (Set Cost Centre) annotations which are often necessary to get accurate profiling information. While `-auto-all`

option annotate every top level declaration it’s not always enough.

Just as their names suggest function `haskell-mode-insert-scc-at-point`

and `haskell-mode-kill-scc-at-point`

insert and remove SCC pragmas without tiresome typing. Of course they work best when binded to some key chain.

Last item in the list is function for changing indentation of complete code blocks. Here is example of indenting where block by two spaces:

This functionality provided by `haskell-move-nested`

. This function is not binded to anything too nor could used interactively with `M-x`

. Here are bindings suggested by documentation:

```
class Sample a where
type Elem a :: *
foldSample :: (acc -> Elem a -> acc) -> acc -> a -> acc
class NullEstimator m where
nullEstimator :: m
class FoldEstimator m a where
addElement :: m -> a -> m
class SemigoupEst m where
joinSample :: m -> m -> m
```

`Sample`

exists only because `Foldable`

couldn’t be used and have only methods which are strictly necessary. `Foldable`

require that container should be able to hold any type. But that’s not the case for unboxed vector and monomorphic containers like `ByteString`

could not be used at all.

Laws for other type classes are quite intuitive and could be summarized as: evaluating statistics for same sample should yield same result regardless of evaluation method. However because in many cases calculations involve floating point values result will be only approximately same.

First `joinSample`

is obviously associative because merging of subsamples is associative. Moreover it most cases it’s commutative because order of elements doesn’t matter by definition of statistics. Probably there are cases when calculation could be expressed with this API and order will matter but I’m not aware about them.

```
(a `joinSample` b) `joinSample` c = a `joinSample` (b `joinSample` c)
a `joinSample` b = b `joinSample` a
```

These are laws for commutative semigroup. So estimators form semigroup. If we add zero element we’ll get a monoid. In this context zero element is empty sample and `nullEstimator`

corresponds to empty sample:

Thus we got a monoid. Unfortunately `Monoid`

type class from base couldn’t be used since it require both zero element and associative operation to be defined. It’s however entirely possible to have estimator where statistics for empty sample is undefined or estimator without `joinSample`

.

Last law says that calculating statistics using `joinSample`

or with fold should yield same result.

However things become much more complicated when calculation with floating point are involved. Even if both function are correct result of consecutive function application could be very far off from expected value. By correct I mean that approximation agrees with exact value in N digits. But it’s to be expected. Calculations with floating point are full of such surprises. More interesting question is when precision loss occurs.

Let denote by \(f\) and \(f^{-1}\) exact functions which we approximate using floating point arithmetics. What we really calculate is following quantity where \(\varepsilon\) and \(\varepsilon'\) are rounding errors introduced by \(f\) and \(f^{-1}\) correspondingly:

\[ x' = f^{-1}\left[\, f(x)(1+\varepsilon) \right](1 + \varepsilon') \]

On next step we Taylor expand this expression and drop terms quadratic in \(\varepsilon\) (They are small!):

\[ x' \approx x \left(1 + \frac{f(x)}{f'(x)}\varepsilon + \varepsilon' \right)\]

This means rounding error introduced by \(f\) have enhacement or supression factor equal to \(f(x)/f'(x)\). Plot below visualize error propagation:

It’s easy to see that when derivartive of \(f(x)\) is small rounding error are greatly enhanced. There is other way to look at this. Function with small derivative squeezes many initial states into much smaller number of possible outputs so we lose information and consequently precision. This information loss is artifact of discretization. No information is lost in \(\mathbb{R} \to \mathbb{R}\) functions.

Second possibility is when \(f(x)\) is large. It’s best illustrated with following example:

\[ f(x) = x + a \qquad f^{-1}(x) = x - a \]

When \(|a| \gg |x|\) last digis of \(x\) are lost during addition and subtraction will not recover them.

Next plot visualize error in for functions \(f(x) = \arctan(x)\) and \(f^{-1} = \tan(x)\) together with error estimates which use formula above. Since we don’t know exact rounding error for every function evaluation machine epsilon is substituted for both \(\varepsilon\) which gives upper bound on error.

For reasont I don’t fully understand it overestimate error but gives reasonable overall agreement.

]]>In fact it’s not very difficult to avoid cabal hell if you don’t have too many dependencies. Recipe is to avoid having two different version of the same package at same time. So if new version of package is installed old version should be deleted or rather unregistered.

However ghc-pkg will refuse to unregister packages if other packages depend on it so they are have to removed too. In the end I wrote simple bash script which removes package and all packages which depends on it:

```
ghc-pkg-force-remove() {
ghc-pkg unregister "$1"
if [ $? != 0 ]; then
# Check that package is indeed here
if ghc-pkg unregister "$1" 2>&1 | grep -E '^ghc-pkg: cannot find package' > /dev/null; then
return
fi
# Happily remove everything
for i in $( ghc-pkg unregister "$1" 2>&1 | sed -e 's/.*would break the following packages://; s/(.*//'); do
echo ' * Removing' "$i"
ghc-pkg unregister "$i"
done
ghc-pkg unregister "$1"
fi
}
```

```
mean :: (G.Vector v Double) => v Double -> Double
mean = fini . G.foldl' go (Mean 0 0)
where
fini (Mean a _) = a
go (Mean m n) x = Mean m' n'
where m' = m + (x - m) / fromIntegral n'
n' = n + 1
data Mean = Mean !Double !Int
```

What could we notice?

It’s implemented in terms of left fold and there is nothing really specific to vectors. Anything foldable could be used: vectors, list, any container or stream of numbers. Yet function accept vectors only.

We have accumulator data type which is internal for library. From now on I’ll call such data types estimators. It’s however useful to expose it. It could be used to calculate mean incrementally. Moreover we could write function to merge partial results and parallelize evaluation.

Frequently several statistics are evaluated at once. In our example

`Mean`

carry sample mean (first field) and number of elements (second field) in the sample. Currently we just discard number of elements. More motivating example is variance. Estimator for variance carry count, mean and two variance estimates: unbiased and maximum likelihood one. Not to mention standard deviation. With simple API we have to write separate functions for each combinations and there way too many of them.

Let try to rewrite code then:

```
-- fold here is fold for some unspecified data structure
estimateMean = foldl addElement emptyEstimator
emptyEstimator :: Mean
emptyEstimator = Mean 0 0
calcMean :: Mean -> Double
calcMean (Mean m _) = m
addElement :: Mean -> Double -> Mean
addElement (Mean m n) x = Mean m' n'
where m' = m + (x - m) / fromIntegral n'
n' = n + 1
data Mean = Mean !Double !Int
```

Now we have estimator for empty sample, functions for adding element to accumulator and extracting mean. First two are generic and apply to different estimators. So they should be generalized to type class. Extraction function is specific for estimator and it will saved for later.

Now it’s good moment to stop and think a little. What are we trying to do? We trying to write generic API for statistics which could be efficiently calculated using fold. Here efficiently means less than in O(n) additional space, preferably in O(1). There are a lot of such statistics: mean, moments and central moments (barring problems with numerical instabilities), maximum etc.

Not all statictics could be calculated in such way. Prime example is median and quantiles in general. One can always choose yet unknown element in such way that any already known element becomes estimate. This means we have to keep them all or *O(n)* additional space.

Since we want to use various data type for sample we need to define type class for it. There are `Foldable`

type class but it require data type fully polymorphic in parameter so no unboxed vectors. It could be worked around using type families:

For fold we need starting value and natural choice is estimator for empty sample. Unfortunately many statistics are not defined for empty sample. In fact mean is not defined too. In example above mean of empty sample is set to zero. It was done purely for convenience. Let put such question of dealing with them aside for a moment. Here is type class:

Next ingredient is function for adding element to estimator:

Note that there is no dependency between `m`

and `a`

. Why? For example when we count elements we want to be able to count elements of any type. Another example is mean, variance, etc. Currently they are limited to doubles. But there is no real reason why couldn’t we calculate mean for sample of `Int`

s.

Last type class is for merging estimators. If we have estimators for two subsamples we may have way to join them and obtain estimate for union of samples:

If estimator is instance of this type class it forms semigroup and choice of the name reflect this. If it is instance of `NullEstimator`

too it forms monoid.

That’s all for now.

]]>