如何在R中对文本表进行子集化

时间:2015-03-19 10:19:19

标签: r subset import-from-csv

我在Windows操作系统上使用R 3.1.3 32位,并且有一个csv文件 - 逗号分隔 - 有8列和1001行(包括标题)(整个数据集是24000+行)。

我的目标是拉出“网站”名称中包含至少一个“HOSPITAL”,“ROYAL”“TRUST”字样的所有行。

> datac <- read.csv("data1c.csv", header = TRUE, colClasses = c("character", "character", "character", "character", "character", "character", "character", "character"))) 
Error: unexpected ')' in "datac <- read.csv("data1c.csv", header = TRUE, colClasses = c("character", "character", "character", "character", "character", "character", "character", "character")))"

> read.csv("data1c.csv", header = TRUE, col.names = c("ODS","Site","NGrouping", "Address1", "Address2", "Address3", "Address4", "Postcode")
Error in match.arg(numerals) : 'arg' should be one of “allow.loss”, “warn.loss”, “no.loss”

> subset("data1c.csv", Site=="HOSPITAL")
Error in subset.default("data1c.csv", Site == "HOSPITAL") : object 'Site' not found

> x <- matrix(rnorm(8008, 1), ncol = 8)
> y <- c(1, seq(8))
> x <- cbind(x, y)

Warning message:
In cbind(x, y) :
   number of rows of result is not a multiple of vector length (arg 2)

我对此非常新,所以任何帮助都会非常感激。

1 个答案:

答案 0 :(得分:0)

对于你的第一个错误,你有一个额外的)是结束(三个而不是两个)。

对于第二个,您忘记将列名列表放在向量中,因此read.csv将其视为额外参数,执行:

read.csv("data1c.csv", header = TRUE, col.names =c("ODS","Site","NGrouping", "Address1", "Address2", "Address3", "Address4", "Postcode")

对于第三个,子集的第一个参数必须是data.frame

subset(datac, Site%in%c("HOSPITAL", "ROYAL", "TRUST"))

对于警告,x有1001行和8列,而y是长度为9(1 + length(seq(8)))的向量,这正是警告告诉您的。因此,您必须从y中删除一个项目,或向x

添加一列