使用read.table()

时间:2016-05-04 06:43:55

标签: r

我在R中加载CSV数据集时遇到问题。数据集可以从

获取

https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53

我使用read.csv导入数据,如下所示,数据集导入正确。

EmpSal <- read.csv('E:/Data/EmpSalaries.csv')

我尝试使用read.table阅读数据,在查看数据集时出现了很多异常情况。

EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)

上面的代码开始从第7行读取数据,数据集实际上包含~14K行,但只导入了5K行。在少数情况下查看数据集时,15-20行被合并为一行,整个行数据出现在一列中。

我可以使用read.csv处理数据集,但我很想知道它为什么不能使用read.table。

2 个答案:

答案 0 :(得分:2)

read.csv定义为:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) 
read.table(file = file, header = header, sep = sep, quote = quote, 
    dec = dec, fill = fill, comment.char = comment.char, ...)

您需要添加quote="\""read.table默认需要单引号,而read.csv需要双引号)

EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE

答案 1 :(得分:2)

如前所述,使用read.csv()命令成功导入数据,但未提及quote参数。 read.csv函数的quote参数的默认值为"\"",而read.table函数的默认值为"\"'"。 检查以下代码,

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

指定数据中有许多单引号。这就是为什么read.table函数不适合你的原因。

尝试以下代码,它将适合您。

 r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)