Laws for estimator type classes

16 August 2012 | Tags: haskell, statistics

In previous post on generic estimators: “Generalizing estimators” four type classes were introduced:

class Sample a where
  type Elem a :: *
  foldSample  :: (acc -> Elem a -> acc) -> acc -> a -> acc

class NullEstimator m where
  nullEstimator :: m

class FoldEstimator m a where
  addElement  :: m -> a -> m

class SemigoupEst m where
  joinSample :: m -> m -> m

Sample exists only because Foldable couldn’t be used and have only methods which are strictly necessary. Foldable require that container should be able to hold any type. But that’s not the case for unboxed vector and monomorphic containers like ByteString could not be used at all.

Laws for other type classes are quite intuitive and could be summarized as: evaluating statistics for same sample should yield same result regardless of evaluation method. However because in many cases calculations involve floating point values result will be only approximately same.

First joinSample is obviously associative because merging of subsamples is associative. Moreover it most cases it’s commutative because order of elements doesn’t matter by definition of statistics. Probably there are cases when calculation could be expressed with this API and order will matter but I’m not aware about them.

(a `joinSample` b) `joinSample` c = a `joinSample` (b `joinSample` c)
a `joinSample` b = b `joinSample` a

These are laws for commutative semigroup. So estimators form semigroup. If we add zero element we’ll get a monoid. In this context zero element is empty sample and nullEstimator corresponds to empty sample:

x `joinSample` nullEstimator = x
nullEstimator `joinSample` x = x

Thus we got a monoid. Unfortunately Monoid type class from base couldn’t be used since it require both zero element and associative operation to be defined. It’s however entirely possible to have estimator where statistics for empty sample is undefined or estimator without joinSample.

Last law says that calculating statistics using joinSample or with fold should yield same result.

fold addElement (fold addElement m xs) ys =
  fold addElement nullEstimator ys `joinSample` fold addElement m xs
comments powered by Disqus