for循环,用于在R

时间:2018-07-17 10:01:16

标签: r

我有一个非常大的数据集,为方便起见,我正在为其创建一个假数据集。我有4个州,5年,每个州2个类型和值。我想获取每个州,年份和类型的值的总和。

如果我运行for和which循环,则无法获得所需的值。我想知道是否有人知道解决方案

StateName<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","c","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d","d")
Year<- rep(1966:1970, times=16)
Type<-c("Y", "Y", "Y", "Y","Y","Y", "Y", "Y", "Y","Y", "Z", "Z", "Z","Z","Z","Z", "Z", "Z","Z","Z","Y", "Y", "Y", "Y","Y","Y", "Y", "Y", "Y","Y", "Z", "Z", "Z","Z","Z","Z", "Z", "Z","Z","Z","Y", "Y", "Y", "Y","Y","Y", "Y", "Y", "Y","Y", "Z", "Z", "Z","Z","Z","Z", "Z", "Z","Z","Z","Y", "Y", "Y", "Y","Y","Y", "Y", "Y", "Y","Y", "Z", "Z", "Z","Z","Z","Z", "Z", "Z","Z","Z")  
Value<-rep(1:4, times=20)
Test_Data<-cbind(StateName, Year, Type, Value)
Test_Data<-data.frame(Test_Data)

New_Table<-cbind(unique(StateName), 1966:1967, NA, NA)
New_Table<-data.frame(New_Table)
colnames(New_Table)<-c("State", "Year", "AA_Sum", "BB_Sum")


for(A in 1:nrow(Test_Data)){
  temp_index = which(as.character(Test_Data$StateName[A]) %in% as.character(New_Table$State) &
                     Test_Data$Year[A] %in% New_Table$Year &
                     Test_Data$Value[A] == "AA"  )
  New_Table$AA_Sum<- sum(Test_Data$Value[temp_index])
}

当前,我收到一个错误“ Summary.factor(integer(0),na.rm = TRUE)中的错误:   “总和”对因素没有意义”

我想知道是否有人知道如何用每个州和年份的Y的总和,以及类似地,每个州和年份的Z的总和来填充New_Table中的数据

1 个答案:

答案 0 :(得分:1)

正如Richard正确指出的那样,您可以使用plyrdplyr来解决这个问题:

library(dplyr)
Test_Data %>% group_by(StateName, Year) %>% summarise(AA_Sum=sum(Value)

您收到的错误是由于Test_Data $ Value是一个因素。为什么?您制作data.frame的程序:

Test_Data<-cbind(StateName, Year, Type, Value)

将四个向量绑定到一个矩阵中。矩阵的所有列/行都具有相同的数据类型。由于您要绑定一个字符,因此结果是一个字符向量。观察:

> str(cbind(StateName, Year, Type, Value))
 chr [1:4, 1:4] "a" "b" "c" "d" "1966" "1967" "1966" "1967" NA NA NA NA NA NA NA NA

将其转换为data.frame时,其默认行为是将字符向量转换为因数。糟透了使用参数stringsAsFactor=FALSE可以避免这种行为。 (另外,请检查功能str,这对于调查对象确实很有帮助。)

您可以单行获得预期结果:

Test_Data <- data.frame(StateName=StateName, Year=rep(1966:1970, times=16), Type=Type, Value=rep(1:4, times=20))    

最后,您的for循环不会执行您期望的操作。 a)tempindex将最多返回 返回整数1,但大多数返回的只是长度为零的向量,因此将返回错误的integer(0)部分。 b)您正在遍历Test_Data中的所有行,但是尝试总结在New_Table中发现的事件。循环的最后一行New_Table$AA_Sum<- ...会使用当前的总和简单地覆盖整个列。

您可能想做的是(如果您忽略其他答案):

for (i in 1:nrow(New_Table)) {
  tempindex <- which(Test_Data$StateName == New_Table$StateName[i] & ...)
  New_Table$AA_Sum[i] <- sum(Test_Data$Value[tempindex])
}

我已经排除了一些练习代码。检查每个tempindexi的值,并根据需要扩展表达式。