statistics-0.15.0.0: A library of statistical types, data, and functions

Copyright (c) 2009 Bryan O'Sullivan BSD3 bos@serpentine.com experimental portable None Haskell98

Statistics.Quantile

Description

Functions for approximating quantiles, i.e. points taken at regular intervals from the cumulative distribution function of a random variable.

The number of quantiles is described below by the variable q, so with q=4, a 4-quantile (also known as a quartile) has 4 intervals, and contains 5 points. The parameter k describes the desired point, where 0 ≤ kq.

Synopsis

# Quantile estimation functions

Below is family of functions which use same algorithm for estimation of sample quantiles. It approximates empirical CDF as continuous piecewise function which interpolates linearly between points $$(X_k,p_k)$$ where $$X_k$$ is k-th order statistics (k-th smallest element) and $$p_k$$ is probability corresponding to it. ContParam determines how $$p_k$$ is chosen. For more detailed explanation see [Hyndman1996].

This is the method used by most statistical software, such as R, Mathematica, SPSS, and S.

data ContParam #

Parameters α and β to the continuousBy function. Exact meaning of parameters is described in [Hyndman1996] in section "Piecewise linear functions"

Constructors

 ContParam !Double !Double
Instances
 # Instance detailsDefined in Statistics.Quantile Methods # Instance detailsDefined in Statistics.Quantile Methodsgfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> ContParam -> c ContParam #gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c ContParam #dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c ContParam) #dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c ContParam) #gmapT :: (forall b. Data b => b -> b) -> ContParam -> ContParam #gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> ContParam -> r #gmapQr :: (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> ContParam -> r #gmapQ :: (forall d. Data d => d -> u) -> ContParam -> [u] #gmapQi :: Int -> (forall d. Data d => d -> u) -> ContParam -> u #gmapM :: Monad m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam #gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam #gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> ContParam -> m ContParam # # Instance detailsDefined in Statistics.Quantile Methods # Instance detailsDefined in Statistics.Quantile MethodsshowList :: [ContParam] -> ShowS # # Instance detailsDefined in Statistics.Quantile Associated Typestype Rep ContParam :: Type -> Type # Methodsto :: Rep ContParam x -> ContParam # # Instance detailsDefined in Statistics.Quantile MethodstoJSONList :: [ContParam] -> Value #toEncodingList :: [ContParam] -> Encoding # # Instance detailsDefined in Statistics.Quantile Methods # Instance detailsDefined in Statistics.Quantile Methodsput :: ContParam -> Put #putList :: [ContParam] -> Put # # We use s as default value which is same as R's default. Instance detailsDefined in Statistics.Quantile Methods type Rep ContParam # Instance detailsDefined in Statistics.Quantile type Rep ContParam = D1 (MetaData "ContParam" "Statistics.Quantile" "statistics-0.15.0.0-KYJLg9h4jsl1bBm8KLc3A8" False) (C1 (MetaCons "ContParam" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) SourceUnpack SourceStrict DecidedStrict) (Rec0 Double) :*: S1 (MetaSel (Nothing :: Maybe Symbol) SourceUnpack SourceStrict DecidedStrict) (Rec0 Double)))

class Default a where #

A class for types with a default value.

Minimal complete definition

Nothing

Methods

def :: a #

The default value for this type.

Instances
 Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methodsdef :: Int # Instance detailsDefined in Data.Default.Class Methodsdef :: Int8 # Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methodsdef :: Word # Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Default () Instance detailsDefined in Data.Default.Class Methodsdef :: () # Instance detailsDefined in Data.Default.Class Methodsdef :: All # Instance detailsDefined in Data.Default.Class Methodsdef :: Any # Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methodsdef :: CInt # Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Data.Default.Class Methods Instance detailsDefined in Numeric.RootFinding Methods Instance detailsDefined in Numeric.RootFinding Methods # We use s as default value which is same as R's default. Instance detailsDefined in Statistics.Quantile Methods Default [a] Instance detailsDefined in Data.Default.Class Methodsdef :: [a] # Default (Maybe a) Instance detailsDefined in Data.Default.Class Methodsdef :: Maybe a # Integral a => Default (Ratio a) Instance detailsDefined in Data.Default.Class Methodsdef :: Ratio a # Default a => Default (IO a) Instance detailsDefined in Data.Default.Class Methodsdef :: IO a # (Default a, RealFloat a) => Default (Complex a) Instance detailsDefined in Data.Default.Class Methodsdef :: Complex a # Default (First a) Instance detailsDefined in Data.Default.Class Methodsdef :: First a # Default (Last a) Instance detailsDefined in Data.Default.Class Methodsdef :: Last a # Default a => Default (Dual a) Instance detailsDefined in Data.Default.Class Methodsdef :: Dual a # Default (Endo a) Instance detailsDefined in Data.Default.Class Methodsdef :: Endo a # Num a => Default (Sum a) Instance detailsDefined in Data.Default.Class Methodsdef :: Sum a # Num a => Default (Product a) Instance detailsDefined in Data.Default.Class Methodsdef :: Product a # Default r => Default (e -> r) Instance detailsDefined in Data.Default.Class Methodsdef :: e -> r # (Default a, Default b) => Default (a, b) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b) # (Default a, Default b, Default c) => Default (a, b, c) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b, c) # (Default a, Default b, Default c, Default d) => Default (a, b, c, d) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b, c, d) # (Default a, Default b, Default c, Default d, Default e) => Default (a, b, c, d, e) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b, c, d, e) # (Default a, Default b, Default c, Default d, Default e, Default f) => Default (a, b, c, d, e, f) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b, c, d, e, f) # (Default a, Default b, Default c, Default d, Default e, Default f, Default g) => Default (a, b, c, d, e, f, g) Instance detailsDefined in Data.Default.Class Methodsdef :: (a, b, c, d, e, f, g) #

Arguments

 :: Vector v Double => ContParam Parameters α and β. -> Int k, the desired quantile. -> Int q, the number of quantiles. -> v Double x, the sample data. -> Double

O(n·log n). Estimate the kth q-quantile of a sample x, using the continuous sample method with the given parameters.

The following properties should hold, otherwise an error will be thrown.

• input sample must be nonempty
• the input does not contain NaN
• 0 ≤ k ≤ q

quantiles :: (Vector v Double, Foldable f, Functor f) => ContParam -> f Int -> Int -> v Double -> f Double #

O(k·n·log n). Estimate set of the kth q-quantile of a sample x, using the continuous sample method with the given parameters. This is faster than calling quantile repeatedly since sample should be sorted only once

The following properties should hold, otherwise an error will be thrown.

• input sample must be nonempty
• the input does not contain NaN
• for every k in set of quantiles 0 ≤ k ≤ q

quantilesVec :: (Vector v Double, Vector v Int) => ContParam -> v Int -> Int -> v Double -> v Double #

O(k·n·log n). Same as quantiles but uses Vector container instead of Foldable one.

## Parameters for the continuous sample method

California Department of Public Works definition, α=0, β=1. Gives a linear interpolation of the empirical CDF. This corresponds to method 4 in R and Mathematica.

Hazen's definition, α=0.5, β=0.5. This is claimed to be popular among hydrologists. This corresponds to method 5 in R and Mathematica.

Definition used by the SPSS statistics application, with α=0, β=0 (also known as Weibull's definition). This corresponds to method 6 in R and Mathematica.

Definition used by the S statistics application, with α=1, β=1. The interpolation points divide the sample range into n-1 intervals. This corresponds to method 7 in R and Mathematica and is default in R.

Median unbiased definition, α=1/3, β=1/3. The resulting quantile estimates are approximately median unbiased regardless of the distribution of x. This corresponds to method 8 in R and Mathematica.

Normal unbiased definition, α=3/8, β=3/8. An approximately unbiased estimate if the empirical distribution approximates the normal distribution. This corresponds to method 9 in R and Mathematica.

# Other algorithms

Arguments

 :: Vector v Double => Int k, the desired quantile. -> Int q, the number of quantiles. -> v Double x, the sample data. -> Double

O(n·log n). Estimate the kth q-quantile of a sample, using the weighted average method. Up to rounding errors it's same as quantile s.

The following properties should hold otherwise an error will be thrown.

• the length of the input is greater than 0
• the input does not contain NaN
• k ≥ 0 and k ≤ q

# Median & other specializations

Arguments

 :: Vector v Double => ContParam Parameters α and β. -> v Double x, the sample data. -> Double

O(n·log n) Estimate median of sample

Arguments

 :: Vector v Double => ContParam Parameters α and β. -> v Double x, the sample data. -> Double

O(n·log n). Estimate the median absolute deviation (MAD) of a sample x using continuousBy. It's robust estimate of variability in sample and defined as:

$MAD = \operatorname{median}(| X_i - \operatorname{median}(X) |)$

Arguments

 :: Vector v Double => ContParam Parameters α and β. -> Int q, the number of quantiles. -> v Double x, the sample data. -> Double

O(n·log n). Estimate the range between q-quantiles 1 and q-1 of a sample x, using the continuous sample method with the given parameters.

For instance, the interquartile range (IQR) can be estimated as follows:

midspread medianUnbiased 4 (U.fromList [1,1,2,2,3])
==> 1.333333

# Deprecated

Arguments

 :: Vector v Double => ContParam Parameters α and β. -> Int k, the desired quantile. -> Int q, the number of quantiles. -> v Double x, the sample data. -> Double

Deprecated: Use quantile instead