使用for循环过滤R中不同组的数据帧

时间:2017-05-06 08:50:00

标签: r for-loop dataframe filter

如果我有数据框:

d <- data.frame(
  name = c("n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9", "n10"),
  color = c("blue", "blue", "red", "blue", "red", "blue", "blue", "red", "green", "green"),
weight = c(53, 34, 63, 25, 45, 24, 66, 12, 45, 8),
  gender = c(1, 0, 0, 0, 1 ,1 ,1 , 0, 1, 0)) 

我如何使用R中的for循环函数来过滤&#39; weight&#39;每个&#39;颜色的平均值<10>的值<10;因此,我会获得所有行的输出,这些行的权重为每个独立颜色的平均值+10。

我知道d[d$weight > mean(d$weight) + (10 + sd(d$weight)), ]会为我提供符合整个示例标准的行,但我试图找到每种颜色的值[&1;}因为每种颜色都是分开的。具有不同的sd值。

我试图理解R中的for循环。

3 个答案:

答案 0 :(得分:3)

使用for循环这是一个可怕的想法,但是因为你问过......

d <- data.frame(
  name = c("n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9", "n10"),
  color = c("blue", "blue", "red", "blue", "red", "blue", "blue", "red", "green", "green"),
  weight = c(53, 34, 63, 25, 45, 24, 66, 12, 45, 8),
  gender = c(1, 0, 0, 0, 1 ,1 ,1 , 0, 1, 0)) 

d[d$weight > (10 + sd(d$weight)), ]

for (color in unique(d$color)) {
  subd <- d[d$color == color, ]
  print(subd[subd$weight > (10 + sd(subd$weight)), ])
}

答案 1 :(得分:1)

我同意@ cj-yetman使用for循环执行此操作并不理想。更好的方法是使用dplyr分组功能。类似的东西:

library(dplyr)
d2 <- d %>% 
  group_by(color) %>% 
  mutate(avg_w = mean(weight, na.rm = T)) %>% 
  filter(abs(weight - avg_w) <= 10)

> d2
Source: local data frame [2 x 5]
Groups: color [2]

    name  color weight gender avg_w
  <fctr> <fctr>  <dbl>  <dbl> <dbl>
1     n2   blue     34      0  40.4
2     n5    red     45      1  40.0

答案 2 :(得分:0)

考虑基本R by函数,它完全按照您的需要执行:在不同级别的数据框架上运行相同的操作,此处为color值。返回值是一个数据帧列表,您可以为最后一个数据帧运行do.call(rbind, ...)

mean_subsetsdflist <- by(d, d$color, function(i) 
     i[i$weight > (mean(i$weight) + (10 + sd(i$weight))), ])

mean_subsetdf <- do.call(rbind, mean_subsetsdflist)