csvreadR Documentation

Package csvread.

Description

Fast specialized CSV file loader, as well as an implementation of a basic 64-bit integer class.

Details

csvread provides functionality for loading large (10M+ lines) CSV and other delimited files, similar to read.csv, but typically faster and using less memory than the standard R loader. While not entirely general, it covers many common use cases when the types of columns in the CSV file are known in advance. In addition, the package provides a class int64, which represents 64-bit integers exactly when reading from a file. The latter is useful when working with 64-bit integer identifiers exported from databases. The CSV file loader supports common column types including integer, double, string, and int64, leaving further type transformations to the user.

Benchmark

The code was tested on a Linux server using a CSV file with 10 million rows and 94 columns, mostly numeric. The size of the raw file was 3.5GB. The file was read from local storage. R version was 3.0.1.

The timing of the read.csv command was 672 seconds with peak memory usage of 16GB and final memory usage of 14.2GB, which fell to 5.8GB after a call to gc().

system.time(df.r <- read.csv("benchmark10M.csv", stringsAsFactors = FALSE, header = FALSE, sep = ","))
   user  system elapsed
649.573  21.231 672.058 

> dim(df.r)
[1] 10000000       94

The timing of the csvread function was 62 seconds (elapsed) with the peak and final memory usage of 4.7GB.

Copyright

Copyright (C) Collective, Inc. with portions Copyright (C) Jabiru Ventures LLC

License

Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0

Documentation

See csvread package page on CRAN.

URL

http://github.com/jabiru/csvread

Installation

Install released version from CRAN

install.packages("csvread")

Install most recent version from github

library(devtools); devtools::install_github("jabiru/csvread")

Author(s)

Sergei Izrailev, please contact at email stored at http://scr.im/izrg