Copyright | (c) 2009 Bryan O'Sullivan |
---|---|
License | BSD3 |
Maintainer | bos@serpentine.com |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell98 |
Functions for approximating quantiles, i.e. points taken at regular intervals from the cumulative distribution function of a random variable.
The number of quantiles is described below by the variable q, so with q=4, a 4-quantile (also known as a quartile) has 4 intervals, and contains 5 points. The parameter k describes the desired point, where 0 ≤ k ≤ q.
- weightedAvg :: Vector v Double => Int -> Int -> v Double -> Double
- data ContParam = ContParam !Double !Double
- continuousBy :: Vector v Double => ContParam -> Int -> Int -> v Double -> Double
- midspread :: Vector v Double => ContParam -> Int -> v Double -> Double
- cadpw :: ContParam
- hazen :: ContParam
- s :: ContParam
- spss :: ContParam
- medianUnbiased :: ContParam
- normalUnbiased :: ContParam
Quantile estimation functions
:: Vector v Double | |
=> Int | k, the desired quantile. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n log n). Estimate the kth q-quantile of a sample, using the weighted average method.
:: Vector v Double | |
=> ContParam | Parameters a and b. |
-> Int | k, the desired quantile. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n log n). Estimate the kth q-quantile of a sample x, using the continuous sample method with the given parameters. This is the method used by most statistical software, such as R, Mathematica, SPSS, and S.
:: Vector v Double | |
=> ContParam | Parameters a and b. |
-> Int | q, the number of quantiles. |
-> v Double | x, the sample data. |
-> Double |
O(n log n). Estimate the range between q-quantiles 1 and q-1 of a sample x, using the continuous sample method with the given parameters.
For instance, the interquartile range (IQR) can be estimated as follows:
midspread medianUnbiased 4 (U.fromList [1,1,2,2,3]) ==> 1.333333
Parameters for the continuous sample method
California Department of Public Works definition, a=0, b=1. Gives a linear interpolation of the empirical CDF. This corresponds to method 4 in R and Mathematica.
Hazen's definition, a=0.5, b=0.5. This is claimed to be popular among hydrologists. This corresponds to method 5 in R and Mathematica.
Definition used by the S statistics application, with a=1,
b=1. The interpolation points divide the sample range into n-1
intervals. This corresponds to method 7 in R and Mathematica.
Definition used by the SPSS statistics application, with a=0, b=0 (also known as Weibull's definition). This corresponds to method 6 in R and Mathematica.
Median unbiased definition, a=1/3, b=1/3. The resulting quantile estimates are approximately median unbiased regardless of the distribution of x. This corresponds to method 8 in R and Mathematica.
Normal unbiased definition, a=3/8, b=3/8. An approximately unbiased estimate if the empirical distribution approximates the normal distribution. This corresponds to method 9 in R and Mathematica.
References
- Weisstein, E.W. Quantile. MathWorld. http://mathworld.wolfram.com/Quantile.html
- Hyndman, R.J.; Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician 50(4):361–365. http://www.jstor.org/stable/2684934