csvreadR Documentation

Package csvread.


Fast specialized CSV file loader, as well as an implementation of a basic 64-bit integer class.


csvread provides functionality for loading large (10M+ lines) CSV and other delimited files, similar to read.csv, but typically faster and using less memory than the standard R loader. While not entirely general, it covers many common use cases when the types of columns in the CSV file are known in advance. In addition, the package provides a class int64, which represents 64-bit integers exactly when reading from a file. The latter is useful when working with 64-bit integer identifiers exported from databases. The CSV file loader supports common column types including integer, double, string, and int64, leaving further type transformations to the user.


The code was tested on a Linux server using a CSV file with 10 million rows and 94 columns, mostly numeric. The size of the raw file was 3.5GB. The file was read from local storage. R version was 3.0.1.

The timing of the read.csv command was 672 seconds with peak memory usage of 16GB and final memory usage of 14.2GB, which fell to 5.8GB after a call to gc().

system.time(df.r <- read.csv("benchmark10M.csv", stringsAsFactors = FALSE, header = FALSE, sep = ","))
   user  system elapsed
649.573  21.231 672.058 

> dim(df.r)
[1] 10000000       94

The timing of the csvread function was 62 seconds (elapsed) with the peak and final memory usage of 4.7GB.


Copyright (C) Collective, Inc. with portions Copyright (C) Jabiru Ventures LLC


Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0


See csvread package page on CRAN.




Install released version from CRAN


Install most recent version from github

library(devtools); devtools::install_github("jabiru/csvread")


Sergei Izrailev, please contact at email stored at http://scr.im/izrg