我正在尝试编辑列标题并从大型.csv文件中删除一些变量(列)。该文件不到4GB,R无法打开它,因为计算机内存不足。
我有清理数据的代码:
#label all of the columns;
install.packages("plyr")
library(plyr)
house_all_name <- rename(house_all, c(V1="ID",V2="Price Paid", V3="Date Sold", V4="Post Code",
V5= "Property Type", V6= "New Build?", V7= "Tenure", V8= "House Name/Number (PAON)",
V9= "SAON", V10= "Street", V11= "Locality", V12= "Town/City", V13= "District",
V14="County", V15="PPD Category Type", V16="Record Status"))
#remove the non-useful variables
house_clean <- house_all_name[,c(-1,-8:-16)]
str(house_clean)
我尝试使用以下代码来读取文件,但我的计算机开始变得非常慢,内存不足。
house_all <- read.table("pp-complete.csv", header=FALSE, sep= ',', fill = TRUE)
因此,要做到这一点,我必须练习&#39;在前5行:
house_all <- read.table("pp-complete.csv", header=FALSE, sep= ',', fill = TRUE, nrows = 5)
从我的研究中我相信可以逐行阅读,但我不知道如何!
此致 托米
P.S。数据文件可在http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
找到