R从内部for循环写入数据帧

时间:2017-03-30 22:58:44

标签: r

R编程全新,所以如果我使用错误的术语,请原谅我。 我正在尝试从for循环内部向数据框插入/附加值。 如果我只打印()它们,我可以得到正确的值,但是当我尝试将它放在数据框内时,我得到的主要是NA。如果我运行此代码,它会打印出我想要的值。

output <- data.frame()
for (i in seq_along(Reasons)){
  assign(paste(Reasons[i]), sum(ER$Reason == paste(Reasons[i])))
  Tot <- get(paste(Reasons[i]))
  assign(paste(Reasons[i],'ER',sep="_"), sum(grepl("ER|Er", ER$Disposition) & ER$Reason == paste(Reasons[i])))
  Er <- get(paste(Reasons[i],'ER',sep="_"))
  assign(paste(Reasons[i],'adm',sep="_"), sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))
  Adm <- get(paste(Reasons[i],'adm',sep="_"))
  assign(paste(Reasons[i],'admrate',sep="_"), sprintf("%.0f%%", (sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))/(sum(ER$Reason == paste(Reasons[i])))*100))
  Rate <- get(paste(Reasons[i],'admrate',sep="_"))
  print(c(Er,Adm,Tot,Rate))
   #clear variables just created
  rm(list=ls(pattern=Reasons[i]))
  rm(Tot,Er,Adm,Rate)
}
[1] "7"   "13"  "20"  "65%"
[1] "4"   "8"   "12"  "67%"
[1] "12"  "12"  "24"  "50%"
[1] "23"  "7"   "30"  "23%"
[1] "7"   "1"   "8"   "12%"
[1] "3"   "1"   "4"   "25%"
[1] "3"  "0"  "3"  "0%"
[1] "6"   "5"   "11"  "45%"
[1] "2"   "9"   "11"  "82%"
[1] "2"   "4"   "6"   "67%"
[1] "10"  "4"   "14"  "29%"
[1] "5"  "0"  "5"  "0%"
[1] "10"  "4"   "14"  "29%"
[1] "0"    "3"    "3"    "100%"
[1] "7"   "3"   "10"  "30%"
[1] "0"    "4"    "4"    "100%"

但是当我使用

output <- rbind(output, c(Er, Adm, Tot, Rate))

而不是

print(c(Er,Adm,Tot,Rate))

我得到第一行的值(7,13,20,65%),然后除了第5行和第15行中的“7”之外的所有NA ...我做错了什么? 提前谢谢

1 个答案:

答案 0 :(得分:1)

由于我不知道您的数据是什么样的,我无法重现您的错误。如果我理解正确,对于Reasons中的每个值,您希望找到(a)观察总数,(b)变量"Er"中字符串Disposition的观察数量},(c)变量"Admi"中字符串Disposition的观察数量和(d)变量"Admi"中字符串Disposition的观察百分比。如果是这种情况,那么您不必使用assignget来执行此操作。

这是一种更简单的方法(虽然这不是最好的方法,见下文):

## Here I just generated some data that might look like the data 
## you are dealing with:
Reasons <- LETTERS[1:10]
ER <- data.frame(Reason = LETTERS[sample.int(10,100, replace = TRUE)],
    Disposition = c("ER", "Admi", "SomethingElse")[sample.int(3,100, replace = TRUE)])

output <- data.frame()
for (i in seq(along = Reasons)){
    Tot <- sum(ER$Reason ==Reasons[i])
    Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
    Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
    Rate <- paste(round(Adm/Tot*100), "%")
    output <- rbind(output, c(Er, Adm, Tot, Rate))
}

> output
   X.4. X.3. X.10. X.30...
1     4    3    10    30 %
2     2    3     6    50 %
3     2    1     6    17 %
4     5    2    14    14 %
5     3    5    11    45 %
6     2    4    11    36 %
7     3    6    14    43 %
8     2    2     5    40 %
9     1    7    11    64 %
10    4    4    12    33 %

动态地将行附加到数据框或矩阵通常不是一个好主意,因为它非常耗费内存。如果您事先知道矩阵的尺寸(就像您一样),您应该使用正确的尺寸初始化它,然后填充循环内的条目:

## Initialize data:
output <- matrix(nrow = length(Reasons), ncol = 4)
for (i in seq(along = Reasons)){
    Tot <- sum(ER$Reason ==Reasons[i])
    Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
    Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
    Rate <- paste(round(Adm/Tot*100), "%")
    output[i,] <- c(Er, Adm, Tot, Rate)
}
然而,有更简单的方法来进行这种评估。你可以,例如使用dplyr包,您可以在其中按变量对数据进行分组(在您的案例中为ER$Reason的不同值)并评估您需要的值:

## Load the package 'dplyr'
library(dplyr)
## Group the variable and evaluate:
output <- ER %>% group_by(Reason) %>%
    dplyr::summarise(Er = sum(grepl("ER|Er", Disposition)),
            Adm = sum(grepl("Admi|admi|ADMI|ADmi", Disposition)),
            Tot = n(),
            Rate = paste(round(Adm/Tot*100), "%"))

> output
# A tibble: 10 × 5
   Reason    Er   Adm   Tot  Rate
    <chr> <int> <int> <int> <chr>
1       A     4     3    10  30 %
2       B     2     3     6  50 %
3       C     2     1     6  17 %
4       D     5     2    14  14 %
5       E     3     5    11  45 %
6       F     2     4    11  36 %
7       G     3     6    14  43 %
8       H     2     2     5  40 %
9       I     1     7    11  64 %
10      J     4     4    12  33 %