Sepulcarium RSS feed.

Criterion's precision

2013-04-07T00:00:00Z

Criterion is excellent benchmarking framework. It easily could be the best one. It’s easy to use, able to generate nice reports. It attempts to get estimate of benchmark run time and error of this estimate. Strangely I never saw attempts to validate these measurements. Criterion is widely used but many questions important questions are still open:

Are measurements biased? OS scheduler and GC pauses tend to increase benchmark running time make distribution of run times asymmetric with tail at large t. This could easily lead to biased results.
Are error estimates reliable? Criterion could under- or overestimate errors. They are overlooked most of the time but still worth checking.
How precise are measurements. It’s easy to see 200% difference and be sure that it exists. But what about 20%? Or even 5%? What is measurable difference?
And of course there’s something else. There’s always something unexpected.

Correctness of error estimates

In this post I’ll try to check whether error estimates are reliable. This is very simple test. Measure same quantity many times and check what part of measurement fall into confidence interval. If too small fraction of event falls in error are underestimated and overestimated otherwise. All measurements were done with GHC 7.6.1 and criterion-0.6.2.1.

Here is program I used. It benchmarks exp function 1000 times. exp is cheap so here we probing regime of cheap functions which take many iterations to measure.

import Criterion.Main

main :: IO ()
main = defaultMain $ replicate 1000 $ bench "exp" $ nf exp (1 :: Double)

Benchmark was run on 6-core AMD Phenom-II with active X session. First thing to plot it run time vs number of benchmark. Here it is:

Now we have problem. Run time linearly depends on the number of benchmark or on the time. This test doesn’t discriminate between them. There is little doubt in existence of the effect but let repeat measurements in cleaner conditions. Second run was performed on completely idle core i7.

No surprise time dependence didn’t disappear but we got peaks at regular intervals. Probably those are GC related. They were completely masked by random jitter in previous measurement. We can force GC before every benchmark with -g switch.

It didn’t help. Spikes didn’t go away only become more frequent. So there’s some strange effect which increases benchmark run time with every iteration. Cause of such weird behavior should be found.

P.S. Now if we return to the first plot we can see that distribution of run times do have tail at big t so we should expect that proper confidence interval would be quite asymmetric. This tail also limits attainable precision.

Less known haskell-mode

2012-08-25T00:00:00Z

haskell-mode is amazing piece of software but unfortunately it lacks documentation which describe available features. Without it only way to discover such features is to look into source code. This is time consuming process although rewarding. I describe here things I found in the current git. Some of functions are not imported by default so module where they defined should be imported manually.

Navigate imports

Jumping to import list to add/remove/tweak something is very frequent activity. Finding import is not that difficult they are always in the beginning of the file. Real problem is find place on which you worked before. Functions haskell-navigate-imports and haskell-navigate-imports-return fix that problem. First cycle though import list and second return you to place where you were before.

They are not binded to any key so one have to add them using hook. here is my choice of bindings:

(require 'haskell-navigate-imports)
(add-hook-list 'haskell-mode-hook (lambda () (
  (local-set-key (kbd "M-[") 'haskell-navigate-imports)
  (local-set-key (kbd "M-]") 'haskell-navigate-imports-return)))

Inserting/removing SCC

Another very useful feature is quick adding/removing SCC (Set Cost Centre) annotations which are often necessary to get accurate profiling information. While -auto-all option annotate every top level declaration it’s not always enough.

Just as their names suggest function haskell-mode-insert-scc-at-point and haskell-mode-kill-scc-at-point insert and remove SCC pragmas without tiresome typing. Of course they work best when binded to some key chain.

Indenting/unindenting code blocks at once.

Last item in the list is function for changing indentation of complete code blocks. Here is example of indenting where block by two spaces:

foo = ...              foo = ...
  where           | →  __    where
   a = 1          | →  __      a = 1
   b = a + 1      | →  __      b = a + 1

This functionality provided by haskell-move-nested. This function is not binded to anything too nor could used interactively with M-x. Here are bindings suggested by documentation:

(reqire 'haskell-move-nested)
(add-hook-list 'haskell-mode-hook (lambda ()
  (define-key haskell-mode-map (kbd "M-")
    (lambda ()
      (interactive)
      (haskell-move-nested -1)))
  (define-key haskell-mode-map (kbd "M-")
    (lambda ()
      (interactive)
      (haskell-move-nested  1)))))

Laws for estimator type classes

2012-08-16T00:00:00Z

In previous post on generic estimators: “Generalizing estimators” four type classes were introduced:

class Sample a where
  type Elem a :: *
  foldSample  :: (acc -> Elem a -> acc) -> acc -> a -> acc

class NullEstimator m where
  nullEstimator :: m

class FoldEstimator m a where
  addElement  :: m -> a -> m

class SemigoupEst m where
  joinSample :: m -> m -> m

Sample exists only because Foldable couldn’t be used and have only methods which are strictly necessary. Foldable require that container should be able to hold any type. But that’s not the case for unboxed vector and monomorphic containers like ByteString could not be used at all.

Laws for other type classes are quite intuitive and could be summarized as: evaluating statistics for same sample should yield same result regardless of evaluation method. However because in many cases calculations involve floating point values result will be only approximately same.

First joinSample is obviously associative because merging of subsamples is associative. Moreover it most cases it’s commutative because order of elements doesn’t matter by definition of statistics. Probably there are cases when calculation could be expressed with this API and order will matter but I’m not aware about them.

(a `joinSample` b) `joinSample` c = a `joinSample` (b `joinSample` c)
a `joinSample` b = b `joinSample` a

These are laws for commutative semigroup. So estimators form semigroup. If we add zero element we’ll get a monoid. In this context zero element is empty sample and nullEstimator corresponds to empty sample:

x `joinSample` nullEstimator = x
nullEstimator `joinSample` x = x

Thus we got a monoid. Unfortunately Monoid type class from base couldn’t be used since it require both zero element and associative operation to be defined. It’s however entirely possible to have estimator where statistics for empty sample is undefined or estimator without joinSample.

Last law says that calculating statistics using joinSample or with fold should yield same result.

fold addElement (fold addElement m xs) ys =
  fold addElement nullEstimator ys `joinSample` fold addElement m xs

Effects of rounding for inverse functions.

2012-07-19T00:00:00Z

If one have function and its inverse testing that latter is really inverse seems to be straightforward. All we need is to generate bunch of random values, apply function in question then its inverse and check that result is same as original value. It’s exactly what QuickCheck does.

However things become much more complicated when calculation with floating point are involved. Even if both function are correct result of consecutive function application could be very far off from expected value. By correct I mean that approximation agrees with exact value in N digits. But it’s to be expected. Calculations with floating point are full of such surprises. More interesting question is when precision loss occurs.

Let denote by \(f\) and \(f^{-1}\) exact functions which we approximate using floating point arithmetics. What we really calculate is following quantity where \(\varepsilon\) and \(\varepsilon'\) are rounding errors introduced by \(f\) and \(f^{-1}\) correspondingly:

\[ x' = f^{-1}\left[\, f(x)(1+\varepsilon) \right](1 + \varepsilon') \]

On next step we Taylor expand this expression and drop terms quadratic in \(\varepsilon\) (They are small!):

\[ x' \approx x \left(1 + \frac{f(x)}{f'(x)}\varepsilon + \varepsilon' \right)\]

This means rounding error introduced by \(f\) have enhacement or supression factor equal to \(f(x)/f'(x)\). Plot below visualize error propagation:

It’s easy to see that when derivartive of \(f(x)\) is small rounding error are greatly enhanced. There is other way to look at this. Function with small derivative squeezes many initial states into much smaller number of possible outputs so we lose information and consequently precision. This information loss is artifact of discretization. No information is lost in \(\mathbb{R} \to \mathbb{R}\) functions.

Second possibility is when \(f(x)\) is large. It’s best illustrated with following example:

\[ f(x) = x + a \qquad f^{-1}(x) = x - a \]

When \(|a| \gg |x|\) last digis of \(x\) are lost during addition and subtraction will not recover them.

Next plot visualize error in for functions \(f(x) = \arctan(x)\) and \(f^{-1} = \tan(x)\) together with error estimates which use formula above. Since we don’t know exact rounding error for every function evaluation machine epsilon is substituted for both \(\varepsilon\) which gives upper bound on error.

For reasont I don’t fully understand it overestimate error but gives reasonable overall agreement.

Delete packages recursively

2012-07-15T00:00:00Z

Maintaining GHC’s package database in consistent state requires some effort. There are tools like cabal-dev which allows to use per-project package database, but it convenient to use global database when you write small throwaway programs.

In fact it’s not very difficult to avoid cabal hell if you don’t have too many dependencies. Recipe is to avoid having two different version of the same package at same time. So if new version of package is installed old version should be deleted or rather unregistered.

However ghc-pkg will refuse to unregister packages if other packages depend on it so they are have to removed too. In the end I wrote simple bash script which removes package and all packages which depends on it:

ghc-pkg-force-remove() {
    ghc-pkg unregister "$1"
    if [ $? != 0 ]; then
	# Check that package is indeed here
	if ghc-pkg unregister "$1" 2>&1 | grep -E '^ghc-pkg: cannot find package' > /dev/null; then
	    return
	fi
	# Happily remove everything
	for i in $( ghc-pkg unregister "$1" 2>&1 | sed -e 's/.*would break the following packages://; s/(.*//'); do
	    echo ' * Removing' "$i"
	    ghc-pkg unregister "$i"
	done
	ghc-pkg unregister "$1"
    fi
}

Generalizing estimators

2012-07-04T00:00:00Z

I’m currently developing new API for estimating various statistics. Probably it’ll require several posts to cover topic. This is first one. Let start from looking at existing code and try to identify deficiencies. Here is function for calculating mean from current statistics-0.10.1.0:

mean :: (G.Vector v Double) => v Double -> Double
mean = fini . G.foldl' go (Mean 0 0)
  where
    fini (Mean a _) = a
    go (Mean m n) x = Mean m' n'
      where m' = m + (x - m) / fromIntegral n'
            n' = n + 1

data Mean = Mean !Double !Int

What could we notice?

It’s implemented in terms of left fold and there is nothing really specific to vectors. Anything foldable could be used: vectors, list, any container or stream of numbers. Yet function accept vectors only.
We have accumulator data type which is internal for library. From now on I’ll call such data types estimators. It’s however useful to expose it. It could be used to calculate mean incrementally. Moreover we could write function to merge partial results and parallelize evaluation.
Frequently several statistics are evaluated at once. In our example Mean carry sample mean (first field) and number of elements (second field) in the sample. Currently we just discard number of elements. More motivating example is variance. Estimator for variance carry count, mean and two variance estimates: unbiased and maximum likelihood one. Not to mention standard deviation. With simple API we have to write separate functions for each combinations and there way too many of them.

Let try to rewrite code then:

-- fold here is fold for some unspecified data structure
estimateMean = foldl addElement emptyEstimator

emptyEstimator :: Mean
emptyEstimator = Mean 0 0

calcMean :: Mean -> Double
calcMean (Mean m _) = m

addElement :: Mean -> Double -> Mean
addElement (Mean m n) x = Mean m' n'
  where m' = m + (x - m) / fromIntegral n'
        n' = n + 1

data Mean = Mean !Double !Int

Now we have estimator for empty sample, functions for adding element to accumulator and extracting mean. First two are generic and apply to different estimators. So they should be generalized to type class. Extraction function is specific for estimator and it will saved for later.

Type classes

Now it’s good moment to stop and think a little. What are we trying to do? We trying to write generic API for statistics which could be efficiently calculated using fold. Here efficiently means less than in O(n) additional space, preferably in O(1). There are a lot of such statistics: mean, moments and central moments (barring problems with numerical instabilities), maximum etc.

Not all statictics could be calculated in such way. Prime example is median and quantiles in general. One can always choose yet unknown element in such way that any already known element becomes estimate. This means we have to keep them all or O(n) additional space.

Since we want to use various data type for sample we need to define type class for it. There are Foldable type class but it require data type fully polymorphic in parameter so no unboxed vectors. It could be worked around using type families:

class Sample a where
  type Elem a :: *
  foldSample  :: (acc -> Elem a -> acc) -> acc -> a -> acc

For fold we need starting value and natural choice is estimator for empty sample. Unfortunately many statistics are not defined for empty sample. In fact mean is not defined too. In example above mean of empty sample is set to zero. It was done purely for convenience. Let put such question of dealing with them aside for a moment. Here is type class:

class NullEstimator m where
  nullEstimator :: m

Next ingredient is function for adding element to estimator:

class FoldEstimator m a where
  addElement  :: m -> a -> m

Note that there is no dependency between m and a. Why? For example when we count elements we want to be able to count elements of any type. Another example is mean, variance, etc. Currently they are limited to doubles. But there is no real reason why couldn’t we calculate mean for sample of Ints.

Last type class is for merging estimators. If we have estimators for two subsamples we may have way to join them and obtain estimate for union of samples:

class SemigoupEst m where
  joinSample :: m -> m -> m

If estimator is instance of this type class it forms semigroup and choice of the name reflect this. If it is instance of NullEstimator too it forms monoid.

That’s all for now.