2 Recommended resources

2.1 Resources for handling big data in R

Handling large data sets in R

Notes:

Medium sized datasets (< 2 GB):loaded in R ( within memory limit but processing is cumbersome (typically in the 1-2 GB range )

Large files that cannot be loaded in R due to R / OS limitations

Large files(2 - 10 GB):process locally using some work around solutions

Very Large files( > 10 GB):needs distributed large scale computing.

Five ways to handle Big Data in R

Notes:

Rule of thumb:

Data sets that contain up to one million records can easily processed with standard R.

Data sets with about one million to one billion records can also be processed in R, but need some additional effort.

Data sets that contain more than one billion records need to be analyzed by map reduce algorithms.

rbindlist() is the fastest method and rbind() is the slowest. bind_rows() is half as fast as rbindlist()

2.2 Resources for the data.table package

2.3 Resources for measuring R performance

2.4 Resources for PostGIS database

PostGIS is an optional extension that must be enabled in each database you want to use it in before you can use it.

2.5 Resources for Alteryx software

2.6 Datasets

Scottish Non-domestic Energy Performance Certificates

2.7 Extensions