我对 R 很陌生,正在寻找一种有效的方法来根据某些条件更新单元格中的值。我假设这需要一个 for 循环或其他函数。
这是数据集。
project_ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
sector <- c("Energy", "None", "None", "Water", "None", "None", "Solar", "Solar", "None")
percentage_approval <- c(100, 50, 50, 100, 25, 25, 100, 30, 40)
type <- c("Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding <- c(5, 2.5, 2.5, 16, 4, 4, 10, 3, 4)
cofinancing <- c(100000, 750000, 80000, 4000000, 6660000, 11000, 12000, 1111111, 1111999)
df <- data.frame(project_ID, sector, percentage_approval, type, funding, cofinancing)
我想做什么:
按项目 ID 对数据进行分组。
然后,检查子项目的“percentage_approval”的总和是否等于100。如果是这种情况,则应删除具有相同项目ID的“程序”行。
如果子项目的“percentage_approval”的总和不等于100,则需要做如下调整:
最后,我想更新“无”的扇区,以采用每个项目 ID 的扇区值。
因此,这是我最终想要的决赛桌:
project_ID_2 <- c(1, 1, 2, 2, 2, 3, 3, 3)
sector_2 <- c("Energy", "Energy", "Water", "Water", "Water", "Solar", "Solar", "None")
percentage_approval_2 <- c(50, 50, 100, 25, 25, 100, 30, 40)
type_2 <- c("Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding_2 <- c(2.5, 2.5, 8, 4, 4, 3, 3, 4)
cofinancing_2 <- c(750000, 80000, 677000, 666000, 11000, 1285714.29, 1000000, 2000000)
df.fixed <- data.frame(project_ID_2, sector_2, percentage_approval_2, type_2, funding_2, cofinancing_2)
答案 0 :(得分:0)
部分答案:
这里是总和等于 100 的时候。我现在正在尝试了解其余部分。
# First, doing the sum and getting the projects ID where sum = 100
df_temp <- df %>%
filter(type != "Program") %>%
group_by(project_ID) %>%
summarise(advance = sum(percentage_approval))
Get100 <- df_temp %>%
filter(advance == 100) %>%
select(project_ID)
# Remove Program lines for 100 projects
df <- df %>%
filter(!(project_ID %in% Get100 & type == "Program"))