Place where sepulci live

**16 August 2012** | Tags: haskell, statistics

In previous post on generic estimators: “Generalizing estimators” four type classes were introduced:

```
class Sample a where
type Elem a :: *
foldSample :: (acc -> Elem a -> acc) -> acc -> a -> acc
class NullEstimator m where
nullEstimator :: m
class FoldEstimator m a where
addElement :: m -> a -> m
class SemigoupEst m where
joinSample :: m -> m -> m
```

`Sample`

exists only because `Foldable`

couldn’t be used and have only methods which are strictly necessary. `Foldable`

require that container should be able to hold any type. But that’s not the case for unboxed vector and monomorphic containers like `ByteString`

could not be used at all.

Laws for other type classes are quite intuitive and could be summarized as: evaluating statistics for same sample should yield same result regardless of evaluation method. However because in many cases calculations involve floating point values result will be only approximately same.

First `joinSample`

is obviously associative because merging of subsamples is associative. Moreover it most cases it’s commutative because order of elements doesn’t matter by definition of statistics. Probably there are cases when calculation could be expressed with this API and order will matter but I’m not aware about them.

```
(a `joinSample` b) `joinSample` c = a `joinSample` (b `joinSample` c)
a `joinSample` b = b `joinSample` a
```

These are laws for commutative semigroup. So estimators form semigroup. If we add zero element we’ll get a monoid. In this context zero element is empty sample and `nullEstimator`

corresponds to empty sample:

```
x `joinSample` nullEstimator = x
nullEstimator `joinSample` x = x
```

Thus we got a monoid. Unfortunately `Monoid`

type class from base couldn’t be used since it require both zero element and associative operation to be defined. It’s however entirely possible to have estimator where statistics for empty sample is undefined or estimator without `joinSample`

.

Last law says that calculating statistics using `joinSample`

or with fold should yield same result.

```
fold addElement (fold addElement m xs) ys =
fold addElement nullEstimator ys `joinSample` fold addElement m xs
```