One nice thing that SAS offers are there libraries. It's somewhat like the equivalent of an Excel spreadsheet if it went to the gym for a few years and it does a darn good job of filling in as database.
Now, if you are short on space or forced to save to network drives (slower reads) then lets try to minimize the size allowing R to read the file faster.
The main functions are saveRDS() and readRDS()
This will be a very short post but definitely worthwhile.
Okay, lets generate some sample data to test out the benefits of the .rds file format.
set.seed(3435)
X = 1500000
CustomerAccount <- data.frame( Premise = as.numeric(sample(X)),
Account = as.numeric(sample(X)),
StreetNumber = as.numeric(sample(X)),
StreetName = sample(c('Main','Landing','Highland Park','Washington',
'Kentucky','Apple','Rose','Parkway','Windsor',
'Orchard','Haven','Olive'), X, replace = TRUE),
StreetSuff = sample(c('St','Ave','Ln'), X, replace = TRUE),
ZipCode = sample(c(50000:80000,1),X, replace = TRUE),
BillCode = sample(c('A','E','D'), X, replace = TRUE),
APP = sample(c('1','0'), X, replace = TRUE),
Email = sample(c('1','0'), X, replace = TRUE),
Zone = sample(c('1','0'), X, replace = TRUE),
Debt = sample(c('1','0'), X, replace = TRUE),
Prizm = sample(c(1:67,1), X, replace = TRUE),
Attr1 = sample(c('1','0'), X, replace = TRUE),
Attr2 = sample(c('1','0'), X, replace = TRUE),
Attr3 = sample(c('1','0'), X, replace = TRUE),
Attr4 = sample(c('1','0'), X, replace = TRUE),
ServicePoint = sample(c(1,2,3), X, replace = TRUE),
Day = sample(c(1:30,1), X, replace = TRUE),
Month = sample(c(1:12,1), X, replace = TRUE),
Year = sample(c(1999:2014,1), X, replace = TRUE),
Days = sample(c(27:35,1),X, replace = TRUE),
Usage = sample(c(1:1299,1), X, replace = TRUE)
)
Now lets write it as a tradtional csv file (148,790 KB)
write.csv(CustomerAccount,'TestingCompressionCSV.csv',row.names = F)
Now lets check out the awesome compression of the rds format (37,097 KB)
saveRDS(CustomerAccount,'TestingCompressionRDS.rds')
Not too shabby.
Now, if you are short on space or forced to save to network drives (slower reads) then lets try to minimize the size allowing R to read the file faster.
The main functions are saveRDS() and readRDS()
This will be a very short post but definitely worthwhile.
Okay, lets generate some sample data to test out the benefits of the .rds file format.
set.seed(3435)
X = 1500000
CustomerAccount <- data.frame( Premise = as.numeric(sample(X)),
Account = as.numeric(sample(X)),
StreetNumber = as.numeric(sample(X)),
StreetName = sample(c('Main','Landing','Highland Park','Washington',
'Kentucky','Apple','Rose','Parkway','Windsor',
'Orchard','Haven','Olive'), X, replace = TRUE),
StreetSuff = sample(c('St','Ave','Ln'), X, replace = TRUE),
ZipCode = sample(c(50000:80000,1),X, replace = TRUE),
BillCode = sample(c('A','E','D'), X, replace = TRUE),
APP = sample(c('1','0'), X, replace = TRUE),
Email = sample(c('1','0'), X, replace = TRUE),
Zone = sample(c('1','0'), X, replace = TRUE),
Debt = sample(c('1','0'), X, replace = TRUE),
Prizm = sample(c(1:67,1), X, replace = TRUE),
Attr1 = sample(c('1','0'), X, replace = TRUE),
Attr2 = sample(c('1','0'), X, replace = TRUE),
Attr3 = sample(c('1','0'), X, replace = TRUE),
Attr4 = sample(c('1','0'), X, replace = TRUE),
ServicePoint = sample(c(1,2,3), X, replace = TRUE),
Day = sample(c(1:30,1), X, replace = TRUE),
Month = sample(c(1:12,1), X, replace = TRUE),
Year = sample(c(1999:2014,1), X, replace = TRUE),
Days = sample(c(27:35,1),X, replace = TRUE),
Usage = sample(c(1:1299,1), X, replace = TRUE)
)
Now lets write it as a tradtional csv file (148,790 KB)
write.csv(CustomerAccount,'TestingCompressionCSV.csv',row.names = F)
Now lets check out the awesome compression of the rds format (37,097 KB)
saveRDS(CustomerAccount,'TestingCompressionRDS.rds')
Not too shabby.