我的数据集有34,000行和353列。一列是位置,它有11,000个唯一值。我想在for循环中对数据集进行子集化。我可以通过为每个子集创建一个新的数据框来实现这一点,但我希望子集形成一个数据帧。我在
下面加了一个样本数据集structure(list(X = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L,
3L), .Label = c("Car", "DOG", "House"), class = "factor"), Y = c(20L,
20L, 20L, 20L, 410L, 410L, 410L, 410L, 60L), Z = structure(c(1L,
3L, 8L, 1L, 7L, 5L, 2L, 4L, 6L), .Label = c("ARGENTINA", "BERLIN GERMANY",
"BUENOS AIRES ARGENTINA", "DUBLIN IRELAND", "FROM AUSTRIA", "GERMANY",
"IN TRANSIT FROM GERMANY", "RIVER PLATE ARGENTINA"), class = "factor"),
K = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor")),
.Names = c("X", "Y", "Z", "K"), class = "data.frame", row.names = c(NA, -9L))
我可以使用以下代码创建新数据框
l=c("ARGENTINA","IRELAND")
for(i in l){
assign(paste("newdata",i,sep=""),
subset(TESTL[which(grepl(i,TESTL$Z)&
!grepl("IN TRANSIT",TESTL$Z)&!grepl("FROM",TESTL$Z)),],
select=c("X","Y","Z")))}
但是我想创建一个新的数据帧来保存所有子集。我试过以下代码
d<-data.frame()
for(i in l){d<-rbind(d,c(
subset(TESTL[which(grepl(i,TESTL$Z) & !grepl("IN TRANSIT",TESTL$Z)
& !grepl("FROM",TESTL$Z)),],
select=c("X","Y","Z")))}
我收到以下错误
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "DOG") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "DUBLIN IRELAND") :
invalid factor level, NA generated
我试图将因子转换为字符但没有成功。任何帮助表示赞赏
答案 0 :(得分:0)
我认为你在这里使用assign
并尝试将子集存储在不同的数据框中会让你的生活变得相当困难。尝试更像这样的东西:
l <- c("ARGENTINA","IRELAND")
res <- setNames(vector("list",length(l)),l)
for (i in seq_along(l)){
res[[i]] <- dat[grepl(l[i],dat$Z) & !grepl("IN TRANSIT",dat$Z) & !grepl("FROM",dat$Z),c("X","Y","Z")]
}
> res
$ARGENTINA
X Y Z
1 Car 20 ARGENTINA
2 Car 20 BUENOS AIRES ARGENTINA
3 Car 20 RIVER PLATE ARGENTINA
4 Car 20 ARGENTINA
$IRELAND
X Y Z
8 DOG 410 DUBLIN IRELAND
> do.call("rbind",res)
X Y Z
ARGENTINA.1 Car 20 ARGENTINA
ARGENTINA.2 Car 20 BUENOS AIRES ARGENTINA
ARGENTINA.3 Car 20 RIVER PLATE ARGENTINA
ARGENTINA.4 Car 20 ARGENTINA
IRELAND DOG 410 DUBLIN IRELAND
答案 1 :(得分:0)
警告是在循环的第一次迭代(ARGENTINA)中引入因子变量X和Z,而在第二次引入IRELAND则是另一个因子水平。所以:
首先,你应该更改你的vaiables n TESTL
的类:
for (i in names(TESTL) [grep ("factor", sapply (TESTL, class))]) {
TESTL[[i]] <- as.character (TESTL[[i]])
}
然后它将使用下一个代码:
d <- data.frame(stringsAsFactors=F)
for(i in l){d <- rbind(d,
TESTL [grepl(i,TESTL$Z) & !grepl("FROM|IN TRANSIT", TESTL$Z), c("X", "Y", "Z")])}