Place where sepulci live
16 August 2012 | Tags: haskell, statistics
In previous post on generic estimators: “Generalizing estimators” four type classes were introduced:
class Sample a where
type Elem a :: *
foldSample :: (acc -> Elem a -> acc) -> acc -> a -> acc
class NullEstimator m where
nullEstimator :: m
class FoldEstimator m a where
addElement :: m -> a -> m
class SemigoupEst m where
joinSample :: m -> m -> m
Sample
exists only because Foldable
couldn’t be used and have only methods which are strictly necessary. Foldable
require that container should be able to hold any type. But that’s not the case for unboxed vector and monomorphic containers like ByteString
could not be used at all.
Laws for other type classes are quite intuitive and could be summarized as: evaluating statistics for same sample should yield same result regardless of evaluation method. However because in many cases calculations involve floating point values result will be only approximately same.
First joinSample
is obviously associative because merging of subsamples is associative. Moreover it most cases it’s commutative because order of elements doesn’t matter by definition of statistics. Probably there are cases when calculation could be expressed with this API and order will matter but I’m not aware about them.
(a `joinSample` b) `joinSample` c = a `joinSample` (b `joinSample` c)
a `joinSample` b = b `joinSample` a
These are laws for commutative semigroup. So estimators form semigroup. If we add zero element we’ll get a monoid. In this context zero element is empty sample and nullEstimator
corresponds to empty sample:
Thus we got a monoid. Unfortunately Monoid
type class from base couldn’t be used since it require both zero element and associative operation to be defined. It’s however entirely possible to have estimator where statistics for empty sample is undefined or estimator without joinSample
.
Last law says that calculating statistics using joinSample
or with fold should yield same result.