所以我的数据看起来像这样:
Week Total Amount Person
1 $5 A
1 $5 B
1 $4 C
1 $2 D
1 $1 E
2 $5 A
2 $1 B
2 $1 H
2 $3 G
2 $5 C
2 $5 F
如何制作,以便每周显示前三名,并将所有其他金额加入"其他"?我想要它显示:
Week Total Amount Person
1 $5 A
1 $5 B
1 $4 C
1 $3 Others
2 $5 A
2 $5 C
2 $5 F
2 $5 Others
请注意,不是前三名的其他金额总计为新的总金额,并且它占每周的随机行数(例如,第1周每个人的总金额为5,但第2周有6个,第3周可能是8或10,第4周可能是1,但我希望方程适用于每一行)
答案 0 :(得分:2)
这很容易使用tidyverse。在名为df。
的数据框中说出来library(tidyverse)
df.new <- df %>%
group_by(Week) %>%
arrange(`Total Amount`) %>%
mutate(Person = ifelse(row_number() > 3, "Others", Person)) %>%
group_by(Week, Person) %>%
summarize(`Total Amount` = sum(`Total Amount`))
如果有&#34; $&#34;在列(它是一个字符串列)中,您首先需要转换它,然后才能使用汇总行。您可以使用parse_number()等函数来执行此操作。
答案 1 :(得分:1)
基础R
df$Person[ave(df$`Total Amount`, df$Week, FUN = function(x)
order(x, decreasing = TRUE)) > 3] = "Others"
df2 = aggregate(df["Total Amount"], df[c("Week", "Person")], sum)
df2[order(df2$Week, df2$Person),]
# Week Person Total Amount
#1 1 A 5
#3 1 B 5
#4 1 C 4
#7 1 Others 3
#2 2 A 5
#5 2 C 5
#6 2 F 5
#8 2 Others 5
数据强>
df = structure(list(Week = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), `Total Amount` = c(5L, 5L, 4L, 2L, 1L, 5L, 1L, 1L, 3L, 5L,
5L), Person = c("A", "B", "C", "D", "E", "A", "B", "H", "G",
"C", "F")), .Names = c("Week", "Total Amount", "Person"), class = "data.frame",
row.names = c(NA, -11L))
答案 2 :(得分:0)
这是你可以做到的一种方式:
library(tidyverse)
df <- df %>%
group_by(Week) %>%
arrange(desc(Total_Amount), .by_group = TRUE) %>%
mutate(id = row_number()) %>%
mutate(Person = case_when(id > 3 ~ "Others",
TRUE ~ as.character(Person)))
然后删除$符号,这样我们就可以对Total_Amount
:
df$Total_Amount <- as.numeric(gsub("\\$", "", df$Total_Amount))
最后,按小组对Total_Amount
求和并添加$符号以恢复所有内容:
df %>%
group_by(Week, Person) %>%
summarise(Total_Amount = sum(Total_Amount)) %>%
mutate(Total_Amount = paste0("$", Total_Amount)) %>%
select(Week, Total_Amount, Person)
返回:
# A tibble: 8 x 3
# Groups: Week [2]
Week Total_Amount Person
<int> <chr> <chr>
1 1 $5 A
2 1 $5 B
3 1 $4 C
4 1 $3 Others
5 2 $5 A
6 2 $5 C
7 2 $5 F
8 2 $5 Others