我想编写一个遍历我的df的函数,用权重总和最大的矩形的矩形ID替换 rectangle 列中未知值“ UNK”的单元格来自与“ UNK”矩形相同的区域。
即使用以下数据,我希望将第一行中的“ UNK”矩形单元格替换为“ 37D5”
area <- c("4.a","4.a","4.a","6.a","4.a","4.a","6.a","6.a","4.a","4.a","4.b","4.a","4.a","4.b","4.b")
rectangle <- c("UNK","37D5","39E1","42E7","37D5","37D5","37D5","38D6","43E8","45F2","40F2","47F0","37D5","49E8","50F0")
weight <- c(1800,200,595,219,517,610,2140,1248,120,492,1085,1278,1759,1902,1862)
trip <- c(1:15)
df1 <- data.frame(area,rectangle,weight,trip)
答案 0 :(得分:1)
让我们首先按area
组来计算一个单独的表,其中包含每个具有最大总权重的矩形:
weights <- df1 %>% group_by(area, rectangle) %>%
summarize(weight = sum(weight)) %>%
filter(weight == max(weight)) %>%
select(-weight)
# A tibble: 3 x 2
# Groups: area [3]
area rectangle
<fct> <fct>
1 4.a 37D5
2 4.b 49E8
3 6.a 37D5
然后,我们将在新表中left_join
,并替换UNK
值:
df1 %>%
left_join(., weights, by = c("area")) %>%
mutate(rectangle.x = if_else(rectangle.x == "UNK", rectangle.y, rectangle.x)) %>%
select(-rectangle.y) %>%
rename(rectangle = rectangle.x)
area rectangle weight trip
1 4.a 37D5 1800 1
2 4.a 37D5 200 2
3 4.a 39E1 595 3
4 6.a 42E7 219 4
5 4.a 37D5 517 5
6 4.a 37D5 610 6
7 6.a 37D5 2140 7
8 6.a 38D6 1248 8
9 4.a 43E8 120 9
10 4.a 45F2 492 10
11 4.b 40F2 1085 11
12 4.a 47F0 1278 12
13 4.a 37D5 1759 13
14 4.b 49E8 1902 14
15 4.b 50F0 1862 15