我在R中加载CSV数据集时遇到问题。数据集可以从
获取https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53
我使用read.csv
导入数据,如下所示,数据集导入正确。
EmpSal <- read.csv('E:/Data/EmpSalaries.csv')
我尝试使用read.table
阅读数据,在查看数据集时出现了很多异常情况。
EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)
上面的代码开始从第7行读取数据,数据集实际上包含~14K行,但只导入了5K行。在少数情况下查看数据集时,15-20行被合并为一行,整个行数据出现在一列中。
我可以使用read.csv
处理数据集,但我很想知道它为什么不能使用read.table。
答案 0 :(得分:2)
read.csv定义为:
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
您需要添加quote="\""
(read.table
默认需要单引号,而read.csv
需要双引号)
EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE
答案 1 :(得分:2)
如前所述,使用read.csv()
命令成功导入数据,但未提及quote参数。
read.csv函数的quote参数的默认值为"\""
,而read.table函数的默认值为"\"'"
。
检查以下代码,
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
指定数据中有许多单引号。这就是为什么read.table函数不适合你的原因。
尝试以下代码,它将适合您。
r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)