Safe Haskell	None
Language	Haskell98

Data.Csv

Contents

Usage examples
- Encoding and decoding custom data types
  - Index-based record conversion
  - Name-based record conversion
Treating CSV data as opaque byte strings
Custom type conversions
- Dealing with bad data
Encoding and decoding
- Encoding and decoding options
Core CSV types
Type conversion

Description

This module implements encoding and decoding of CSV data. The implementation is RFC 4180 compliant, with the following extensions:

Empty lines are ignored.
Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines.
Escaped fields may contain any characters (but double-quotes need to be escaped).

Synopsis

Usage examples

Encoding standard Haskell types:

>>> encode [("John" :: Text, 27), ("Jane", 28)]
"John,27\r\nJane,28\r\n"

Since string literals are overloaded we have to supply a type signature as the compiler couldn't deduce which string type (i.e. String or Text) we want to use. In most cases type inference will infer the type from the context and you can omit type signatures.

Decoding standard Haskell types:

>>> decode NoHeader "John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int))
Right [("John",27),("Jane",28)]

We pass NoHeader as the first argument to indicate that the CSV input data isn't preceded by a header.

In practice, the return type of decode rarely needs to be given, as it can often be inferred from the context.

Encoding and decoding custom data types

To encode and decode your own data types you need to defined instances of either ToRecord and FromRecord or ToNamedRecord and FromNamedRecord. The former is used for encoding/decoding using the column index and the latter using column name.

There are two ways to to define these instances, either by manually defining them or by using GHC generics to derive them automatically.

Index-based record conversion

Derived:

{-# LANGUAGE DeriveGeneric #-}

data Person = Person { name :: !Text , salary :: !Int }
    deriving Generic

instance FromRecord Person
instance ToRecord Person

Manually defined:

data Person = Person { name :: !Text , salary :: !Int }

instance FromRecord Person where
    parseRecord v
        | length v == 2 = Person <$> v .! 0 <*> v .! 1
        | otherwise     = mzero
instance ToRecord Person where
    toRecord (Person name age) = record [
        toField name, toField age]

We can now use e.g. encode and decode to encode and decode our data type.

Encoding:

>>> encode [Person ("John" :: Text) 27]
"John,27\r\n"

Decoding:

>>> decode NoHeader "John,27\r\n" :: Either String (Vector Person)
Right (["name","salary"],[Person {name = "John", salary = 27}])

Name-based record conversion

Derived:

{-# LANGUAGE DeriveGeneric #-}

data Person = Person { name :: !Text , salary :: !Int }
    deriving Generic

instance FromNamedRecord Person
instance ToNamedRecord Person
instance DefaultOrdered Person

Manually defined:

data Person = Person { name :: !Text , salary :: !Int }

instance FromNamedRecord Person where
    parseNamedRecord m = Person <$> m .: "name" <*> m .: "salary"
instance ToNamedRecord Person
    toNamedRecord (Person name salary) = namedRecord [
        "name" .= name, "salary" .= salary]
instance DefaultOrdered Person
    where headerOrder = header ["name", "salary"]

We can now use e.g. encodeDefaultOrderedByName (or encodeByName with an explicit header order) and decodeByName to encode and decode our data type.

Encoding:

>>> encodeDefaultOrderedByName [Person ("John" :: Text) 27]
"name,salary\r\nJohn,27\r\n"

Decoding:

>>> decodeByName "name,salary\r\nJohn,27\r\n" :: Either String (Header, Vector Person)
Right (["name","salary"],[Person {name = "John", salary = 27}])

Treating CSV data as opaque byte strings

Sometimes you might want to work with a CSV file which contents is unknown to you. For example, you might want remove the second column of a file without knowing anything about its content. To parse a CSV file to a generic representation, just convert each record to a Vector ByteString value, like so:

>>> decode NoHeader "John,27\r\nJane,28\r\n" :: Either String (Vector (Vector ByteString))
Right [["John","27"],["Jane","28"]]

As the example output above shows, all the fields are returned as uninterpreted ByteString values.

Custom type conversions

Most of the time the existing FromField and ToField instances do what you want. However, if you need to parse a different format (e.g. hex) but use a type (e.g. Int) for which there's already a FromField instance, you need to use a newtype. Example:

newtype Hex = Hex Int

parseHex :: ByteString -> Parser Int
parseHex = ...

instance FromField Hex where
    parseField s = Hex <$> parseHex s

Other than giving an explicit type signature, you can pattern match on the newtype constructor to indicate which type conversion you want to have the library use:

case decode NoHeader "0xff,0xaa\r\n0x11,0x22\r\n" of
    Left err -> putStrLn err
    Right v  -> forM_ v $ \ (Hex val1, Hex val2) ->
        print (val1, val2)

If a field might be in one several different formats, you can use a newtype to normalize the result:

newtype HexOrDecimal = HexOrDecimal Int

instance FromField DefaultToZero where
    parseField s = case runParser (parseField s :: Parser Hex) of
        Left err -> HexOrDecimal <$> parseField s  -- Uses Int instance
        Right n  -> pure $ HexOrDecimal n

You can use the unit type, (), to ignore a column. The parseField method for () doesn't look at the Field and thus always decodes successfully. Note that it lacks a corresponding ToField instance. Example:

case decode NoHeader "foo,1\r\nbar,22" of
    Left  err -> putStrLn err
    Right v   -> forM_ v $ \ ((), i) -> print (i :: Int)

Dealing with bad data

If your input might contain invalid fields, you can write a custom FromField instance to deal with them. Example:

newtype DefaultToZero = DefaultToZero Int

instance FromField DefaultToZero where
    parseField s = case runParser (parseField s) of
        Left err -> pure $ DefaultToZero 0
        Right n  -> pure $ DefaultToZero n

Encoding and decoding

Encoding and decoding is a two step process. To encode a value, it is first converted to a generic representation, using either ToRecord or ToNamedRecord. The generic representation is then encoded as CSV data. To decode a value the process is reversed and either FromRecord or FromNamedRecord is used instead. Both these steps are combined in the encode and decode functions.

data HasHeader

Is the CSV data preceded by a header?

Constructors

HasHeader	The CSV data is preceded by a header
NoHeader	The CSV data is not preceded by a header

decode

Arguments

:: FromRecord a
=> HasHeader	Data contains header that should be skipped
-> ByteString	CSV data
-> Either String (Vector a)

Efficiently deserialize CSV records from a lazy ByteString. If this fails due to incomplete or invalid input, Left msg is returned. Equivalent to decodeWith defaultDecodeOptions.

decodeByName

Arguments

:: FromNamedRecord a
=> ByteString	CSV data
-> Either String (Header, Vector a)

Efficiently deserialize CSV records from a lazy ByteString. If this fails due to incomplete or invalid input, Left msg is returned. The data is assumed to be preceeded by a header. Equivalent to decodeByNameWith defaultDecodeOptions.

data Quoting

Should quoting be applied to fields, and at which level

Constructors

QuoteNone	No quotes
QuoteMinimal	Quotes according to RFC 4180
QuoteAll	Always quote

Instances

Eq Quoting
Show Quoting

encode :: ToRecord a => [a] -> ByteString

Efficiently serialize CSV records as a lazy ByteString.

encodeByName :: ToNamedRecord a => Header -> [a] -> ByteString

Efficiently serialize CSV records as a lazy ByteString. The header is written before any records and dictates the field order.

encodeDefaultOrderedByName :: (DefaultOrdered a, ToNamedRecord a) => [a] -> ByteString

Like encodeByName, but header and field order is dictated by the header method.

class DefaultOrdered a where

A type that has a default field order when converted to CSV. This class lets you specify how to get the headers to use for a record type that's an instance of ToNamedRecord.

To derive an instance, the type is required to only have one constructor and that constructor must have named fields (also known as selectors) for all fields.

Right: data Foo = Foo { foo :: !Int }

Wrong: data Bar = Bar Int

If you try to derive an instance using GHC generics and your type doesn't have named fields, you will get an error along the lines of:

<interactive>:9:10:
    No instance for (DefaultOrdered (M1 S NoSelector (K1 R Char) ()))
      arising from a use of ‘Data.Csv.Conversion.$gdmheader’
    In the expression: Data.Csv.Conversion.$gdmheader
    In an equation for ‘header’:
        header = Data.Csv.Conversion.$gdmheader
    In the instance declaration for ‘DefaultOrdered Foo’

Minimal complete definition

Nothing

Methods

headerOrder :: a -> Header

The header order for this record. Should include the names used in the NamedRecord returned by toNamedRecord. Pass undefined as the argument, together with a type annotation e.g. headerOrder (undefined :: MyRecord).

Encoding and decoding options

These functions can be used to control how data is encoded and decoded. For example, they can be used to encode data in a tab-separated format instead of in a comma-separated format.

data DecodeOptions

Options that controls how data is decoded. These options can be used to e.g. decode tab-separated data instead of comma-separated data.

To avoid having your program stop compiling when new fields are added to DecodeOptions, create option records by overriding values in defaultDecodeOptions. Example:

myOptions = defaultDecodeOptions {
      decDelimiter = fromIntegral (ord '\t')
    }

Constructors

DecodeOptions
Fields decDelimiter :: !Word8 Field delimiter.

Instances

Eq DecodeOptions
Show DecodeOptions

defaultDecodeOptions :: DecodeOptions

Decoding options for parsing CSV files.

decodeWith

Arguments

:: FromRecord a
=> DecodeOptions	Decoding options
-> HasHeader	Data contains header that should be skipped
-> ByteString	CSV data
-> Either String (Vector a)

Like decode, but lets you customize how the CSV data is parsed.

decodeByNameWith

Arguments

:: FromNamedRecord a
=> DecodeOptions	Decoding options
-> ByteString	CSV data
-> Either String (Header, Vector a)

Like decodeByName, but lets you customize how the CSV data is parsed.

data EncodeOptions

Options that controls how data is encoded. These options can be used to e.g. encode data in a tab-separated format instead of in a comma-separated format.

To avoid having your program stop compiling when new fields are added to EncodeOptions, create option records by overriding values in defaultEncodeOptions. Example:

myOptions = defaultEncodeOptions {
      encDelimiter = fromIntegral (ord '\t')
    }

N.B. The encDelimiter must not be the quote character (i.e. ") or one of the record separator characters (i.e. \n or \r).

Constructors

EncodeOptions
Fields encDelimiter :: !Word8 Field delimiter. encUseCrLf :: !Bool Record separator selection. `True` for CRLF (`\r\n`) and `False` for LF (`\n`). encIncludeHeader :: !Bool Include a header row when encoding `ToNamedRecord` instances. encQuoting :: !Quoting What kind of quoting should be applied to text fields.

Instances

Eq EncodeOptions
Show EncodeOptions

defaultEncodeOptions :: EncodeOptions

Encoding options for CSV files.

encodeWith :: ToRecord a => EncodeOptions -> [a] -> ByteString

Like encode, but lets you customize how the CSV data is encoded.

encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> [a] -> ByteString

Like encodeByName, but lets you customize how the CSV data is encoded.

encodeDefaultOrderedByNameWith :: forall a. (DefaultOrdered a, ToNamedRecord a) => EncodeOptions -> [a] -> ByteString

Like encodeDefaultOrderedByNameWith, but lets you customize how the CSV data is encoded.

Core CSV types

type Csv = Vector Record

CSV data represented as a Haskell vector of vector of bytestrings.

type Record = Vector Field

A record corresponds to a single line in a CSV file.

type Field = ByteString

A single field within a record.

type Header = Vector Name

The header corresponds to the first line a CSV file. Not all CSV files have a header.

type Name = ByteString

A header has one or more names, describing the data in the column following the name.

type NamedRecord = HashMap ByteString ByteString

A record corresponds to a single line in a CSV file, indexed by the column name rather than the column index.

Type conversion

There are two ways to convert CSV records to and from and user-defined data types: index-based conversion and name-based conversion.

Index-based record conversion

Index-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's position (its index) in the record. The first column in a CSV file is given index 0, the second index 1, and so on.

class FromRecord a where

A type that can be converted from a single CSV record, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Record has the wrong number of columns.

Given this example data:

John,56
Jane,55

here's an example type and instance:

data Person = Person { name :: !Text, age :: !Int }

instance FromRecord Person where
    parseRecord v
        | length v == 2 = Person <$>
                          v .! 0 <*>
                          v .! 1
        | otherwise     = mzero

Minimal complete definition

Nothing

Methods

parseRecord :: Record -> Parser a

Instances

FromField a => FromRecord [a]
FromField a => FromRecord (Vector a)
(FromField a, Unbox a) => FromRecord (Vector a)
FromField a => FromRecord (Only a)
(FromField a, FromField b) => FromRecord (a, b)
(FromField a, FromField b, FromField c) => FromRecord (a, b, c)
(FromField a, FromField b, FromField c, FromField d) => FromRecord (a, b, c, d)
(FromField a, FromField b, FromField c, FromField d, FromField e) => FromRecord (a, b, c, d, e)
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f) => FromRecord (a, b, c, d, e, f)
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g) => FromRecord (a, b, c, d, e, f, g)
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g, FromField h) => FromRecord (a, b, c, d, e, f, g, h)
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g, FromField h, FromField i) => FromRecord (a, b, c, d, e, f, g, h, i)

data Parser a

Conversion of a field to a value might fail e.g. if the field is malformed. This possibility is captured by the Parser type, which lets you compose several field conversions together in such a way that if any of them fail, the whole record conversion fails.

Instances

Alternative Parser
Monad Parser
Functor Parser
MonadPlus Parser
Applicative Parser
Monoid (Parser a)

runParser :: Parser a -> Either String a

Run a Parser, returning either Left errMsg or Right result. Forces the value in the Left or Right constructors to weak head normal form.

You most likely won't need to use this function directly, but it's included for completeness.

index :: FromField a => Record -> Int -> Parser a

Retrieve the nth field in the given record. The result is empty if the value cannot be converted to the desired type. Raises an exception if the index is out of bounds.

index is a simple convenience function that is equivalent to parseField (v ! idx). If you're certain that the index is not out of bounds, using unsafeIndex is somewhat faster.

(.!) :: FromField a => Record -> Int -> Parser a infixl 9

Alias for index.

unsafeIndex :: FromField a => Record -> Int -> Parser a

Like index but without bounds checking.

class ToRecord a where

A type that can be converted to a single CSV record.

An example type and instance:

data Person = Person { name :: !Text, age :: !Int }

instance ToRecord Person where
    toRecord (Person name age) = record [
        toField name, toField age]

Outputs data on this form:

John,56
Jane,55

Minimal complete definition

Nothing

Methods

toRecord :: a -> Record

Convert a value to a record.

Instances

ToField a => ToRecord [a]
ToField a => ToRecord (Vector a)
(ToField a, Unbox a) => ToRecord (Vector a)
ToField a => ToRecord (Only a)
(ToField a, ToField b) => ToRecord (a, b)
(ToField a, ToField b, ToField c) => ToRecord (a, b, c)
(ToField a, ToField b, ToField c, ToField d) => ToRecord (a, b, c, d)
(ToField a, ToField b, ToField c, ToField d, ToField e) => ToRecord (a, b, c, d, e)
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f) => ToRecord (a, b, c, d, e, f)
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g) => ToRecord (a, b, c, d, e, f, g)
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g, ToField h) => ToRecord (a, b, c, d, e, f, g, h)
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g, ToField h, ToField i) => ToRecord (a, b, c, d, e, f, g, h, i)

record :: [ByteString] -> Record

Construct a record from a list of ByteStrings. Use toField to convert values to ByteStrings for use with record.

newtype Only a

Haskell lacks a single-element tuple type, so if you CSV data with just one column you can use the Only type to represent a single-column result.

Constructors

Only
Fields fromOnly :: a

Instances

Eq a => Eq (Only a)
Ord a => Ord (Only a)
Read a => Read (Only a)
Show a => Show (Only a)
ToField a => ToRecord (Only a)
FromField a => FromRecord (Only a)

Name-based record conversion

Name-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's name. The names of the fields are defined by the first line in the file, also known as the header. Name-based conversion is more robust to changes in the file structure e.g. to reording or addition of columns, but can be a bit slower.

class FromNamedRecord a where

A type that can be converted from a single CSV record, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Record has the wrong number of columns.

Given this example data:

name,age
John,56
Jane,55

here's an example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Person = Person { name :: !Text, age :: !Int }

instance FromNamedRecord Person where
    parseNamedRecord m = Person <$>
                         m .: "name" <*>
                         m .: "age"

Note the use of the OverloadedStrings language extension which enables ByteString values to be written as string literals.

Minimal complete definition

Nothing

Methods

parseNamedRecord :: NamedRecord -> Parser a

Instances

(FromField a, FromField b, Ord a) => FromNamedRecord (Map a b)
(Eq a, FromField a, FromField b, Hashable a) => FromNamedRecord (HashMap a b)

lookup :: FromField a => NamedRecord -> ByteString -> Parser a

Retrieve a field in the given record by name. The result is empty if the field is missing or if the value cannot be converted to the desired type.

(.:) :: FromField a => NamedRecord -> ByteString -> Parser a

Alias for lookup.

class ToNamedRecord a where

A type that can be converted to a single CSV record.

An example type and instance:

data Person = Person { name :: !Text, age :: !Int }

instance ToNamedRecord Person where
    toNamedRecord (Person name age) = namedRecord [
        "name" .= name, "age" .= age]

Minimal complete definition

Nothing

Methods

toNamedRecord :: a -> NamedRecord

Convert a value to a named record.

Instances

(ToField a, ToField b, Ord a) => ToNamedRecord (Map a b)
(Eq a, ToField a, ToField b, Hashable a) => ToNamedRecord (HashMap a b)

namedRecord :: [(ByteString, ByteString)] -> NamedRecord

Construct a named record from a list of name-value ByteString pairs. Use .= to construct such a pair from a name and a value.

namedField :: ToField a => ByteString -> a -> (ByteString, ByteString)

Construct a pair from a name and a value. For use with namedRecord.

(.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)

Alias for namedField.

header :: [ByteString] -> Header

Construct a header from a list of ByteStrings.

Field conversion

The FromField and ToField classes define how to convert between Fields and values you care about (e.g. Ints). Most of the time you don't need to write your own instances as the standard ones cover most use cases.

class FromField a where

A type that can be converted from a single CSV field, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Field can't be converted to the given type.

Example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Color = Red | Green | Blue

instance FromField Color where
    parseField s
        | s == "R"  = pure Red
        | s == "G"  = pure Green
        | s == "B"  = pure Blue
        | otherwise = mzero

Methods

parseField :: Field -> Parser a

Instances

FromField Char	Assumes UTF-8 encoding.
FromField Double	Accepts same syntax as `rational`. Ignores whitespace.
FromField Float	Accepts same syntax as `rational`. Ignores whitespace.
FromField Int	Accepts a signed decimal number. Ignores whitespace.
FromField Int8	Accepts a signed decimal number. Ignores whitespace.
FromField Int16	Accepts a signed decimal number. Ignores whitespace.
FromField Int32	Accepts a signed decimal number. Ignores whitespace.
FromField Int64	Accepts a signed decimal number. Ignores whitespace.
FromField Integer	Accepts a signed decimal number. Ignores whitespace.
FromField Word	Accepts an unsigned decimal number. Ignores whitespace.
FromField Word8	Accepts an unsigned decimal number. Ignores whitespace.
FromField Word16	Accepts an unsigned decimal number. Ignores whitespace.
FromField Word32	Accepts an unsigned decimal number. Ignores whitespace.
FromField Word64	Accepts an unsigned decimal number. Ignores whitespace.
FromField ()	Ignores the `Field`. Always succeeds.
FromField ByteString
FromField ByteString
FromField Text	Assumes UTF-8 encoding. Fails on invalid byte sequences.
FromField Text	Assumes UTF-8 encoding. Fails on invalid byte sequences.
FromField [Char]	Assumes UTF-8 encoding. Fails on invalid byte sequences.
FromField a => FromField (Maybe a)	`Nothing` if the `Field` is `empty`, `Just` otherwise.
FromField a => FromField (Either Field a)	`Left field` if conversion failed, `Right` otherwise.

class ToField a where

A type that can be converted to a single CSV field.

Example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Color = Red | Green | Blue

instance ToField Color where
    toField Red   = "R"
    toField Green = "G"
    toField Blue  = "B"

Methods

toField :: a -> Field

Instances

ToField Char	Uses UTF-8 encoding.
ToField Double	Uses decimal notation or scientific notation, depending on the number.
ToField Float	Uses decimal notation or scientific notation, depending on the number.
ToField Int	Uses decimal encoding with optional sign.
ToField Int8	Uses decimal encoding with optional sign.
ToField Int16	Uses decimal encoding with optional sign.
ToField Int32	Uses decimal encoding with optional sign.
ToField Int64	Uses decimal encoding with optional sign.
ToField Integer	Uses decimal encoding with optional sign.
ToField Word	Uses decimal encoding.
ToField Word8	Uses decimal encoding.
ToField Word16	Uses decimal encoding.
ToField Word32	Uses decimal encoding.
ToField Word64	Uses decimal encoding.
ToField ByteString
ToField ByteString
ToField Text	Uses UTF-8 encoding.
ToField Text	Uses UTF-8 encoding.
ToField [Char]	Uses UTF-8 encoding.
ToField a => ToField (Maybe a)	`Nothing` is encoded as an `empty` field.