用子集的最高值替换缺失值

时间:2019-01-21 18:12:30

标签: r replace

我想编写一个遍历我的df的函数,用权重总和最大的矩形的矩形ID替换 rectangle 列中未知值“ UNK”的单元格来自与“ UNK”矩形相同的区域

即使用以下数据,我希望将第一行中的“ UNK”矩形单元格替换为“ 37D5”

area <- c("4.a","4.a","4.a","6.a","4.a","4.a","6.a","6.a","4.a","4.a","4.b","4.a","4.a","4.b","4.b")
rectangle <- c("UNK","37D5","39E1","42E7","37D5","37D5","37D5","38D6","43E8","45F2","40F2","47F0","37D5","49E8","50F0")
weight <- c(1800,200,595,219,517,610,2140,1248,120,492,1085,1278,1759,1902,1862)
trip <- c(1:15)

df1 <- data.frame(area,rectangle,weight,trip)

1 个答案:

答案 0 :(得分:1)

让我们首先按area组来计算一个单独的表,其中包含每个具有最大总权重的矩形:

weights <- df1 %>% group_by(area, rectangle) %>% 
  summarize(weight = sum(weight)) %>% 
  filter(weight == max(weight)) %>% 
  select(-weight)

# A tibble: 3 x 2
# Groups:   area [3]
  area  rectangle
  <fct> <fct>    
1 4.a   37D5     
2 4.b   49E8     
3 6.a   37D5

然后,我们将在新表中left_join,并替换UNK值:

df1 %>% 
  left_join(., weights, by = c("area")) %>% 
  mutate(rectangle.x = if_else(rectangle.x == "UNK", rectangle.y, rectangle.x)) %>% 
  select(-rectangle.y) %>% 
  rename(rectangle = rectangle.x)

   area rectangle weight trip
1   4.a      37D5   1800    1
2   4.a      37D5    200    2
3   4.a      39E1    595    3
4   6.a      42E7    219    4
5   4.a      37D5    517    5
6   4.a      37D5    610    6
7   6.a      37D5   2140    7
8   6.a      38D6   1248    8
9   4.a      43E8    120    9
10  4.a      45F2    492   10
11  4.b      40F2   1085   11
12  4.a      47F0   1278   12
13  4.a      37D5   1759   13
14  4.b      49E8   1902   14
15  4.b      50F0   1862   15