在for循环中进行子集化

时间:2014-05-29 15:59:13

标签: r rbind

我的数据集有34,000行和353列。一列是位置,它有11,000个唯一值。我想在for循环中对数据集进行子集化。我可以通过为每个子集创建一个新的数据框来实现这一点,但我希望子集形成一个数据帧。我在

下面加了一个样本数据集
structure(list(X = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 
3L), .Label = c("Car", "DOG", "House"), class = "factor"), Y = c(20L, 
20L, 20L, 20L, 410L, 410L, 410L, 410L, 60L), Z = structure(c(1L, 
3L, 8L, 1L, 7L, 5L, 2L, 4L, 6L), .Label = c("ARGENTINA", "BERLIN GERMANY", 
"BUENOS AIRES ARGENTINA", "DUBLIN IRELAND", "FROM AUSTRIA", "GERMANY", 
"IN TRANSIT FROM GERMANY", "RIVER PLATE ARGENTINA"), class = "factor"), 
K = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor")),
.Names = c("X", "Y", "Z", "K"), class = "data.frame", row.names = c(NA, -9L))

我可以使用以下代码创建新数据框

 l=c("ARGENTINA","IRELAND")
for(i in l){
     assign(paste("newdata",i,sep=""),
     subset(TESTL[which(grepl(i,TESTL$Z)&
     !grepl("IN TRANSIT",TESTL$Z)&!grepl("FROM",TESTL$Z)),],
      select=c("X","Y","Z")))}

但是我想创建一个新的数据帧来保存所有子集。我试过以下代码

d<-data.frame()
for(i in l){d<-rbind(d,c(
subset(TESTL[which(grepl(i,TESTL$Z) & !grepl("IN TRANSIT",TESTL$Z)
& !grepl("FROM",TESTL$Z)),],
    select=c("X","Y","Z")))}

我收到以下错误

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "DOG") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "DUBLIN IRELAND") :
invalid factor level, NA generated

我试图将因子转换为字符但没有成功。任何帮助表示赞赏

2 个答案:

答案 0 :(得分:0)

我认为你在这里使用assign并尝试将子集存储在不同的数据框中会让你的生活变得相当困难。尝试更像这样的东西:

l <- c("ARGENTINA","IRELAND")
res <- setNames(vector("list",length(l)),l)

for (i in seq_along(l)){                 
    res[[i]] <- dat[grepl(l[i],dat$Z) & !grepl("IN TRANSIT",dat$Z) & !grepl("FROM",dat$Z),c("X","Y","Z")]
}

> res
$ARGENTINA
    X  Y                      Z
1 Car 20              ARGENTINA
2 Car 20 BUENOS AIRES ARGENTINA
3 Car 20  RIVER PLATE ARGENTINA
4 Car 20              ARGENTINA

$IRELAND
    X   Y              Z
8 DOG 410 DUBLIN IRELAND


> do.call("rbind",res)
              X   Y                      Z
ARGENTINA.1 Car  20              ARGENTINA
ARGENTINA.2 Car  20 BUENOS AIRES ARGENTINA
ARGENTINA.3 Car  20  RIVER PLATE ARGENTINA
ARGENTINA.4 Car  20              ARGENTINA
IRELAND     DOG 410         DUBLIN IRELAND

答案 1 :(得分:0)

警告是在循环的第一次迭代(ARGENTINA)中引入因子变量X和Z,而在第二次引入IRELAND则是另一个因子水平。所以:

首先,你应该更改你的vaiables n TESTL的类:

for (i in names(TESTL) [grep ("factor", sapply (TESTL, class))]) {
  TESTL[[i]] <- as.character (TESTL[[i]])
 }

然后它将使用下一个代码:

d <- data.frame(stringsAsFactors=F)
for(i in l){d <- rbind(d,
        TESTL [grepl(i,TESTL$Z) & !grepl("FROM|IN TRANSIT", TESTL$Z), c("X", "Y", "Z")])}