将行分组为新行并在r

时间:2017-11-16 21:59:09

标签: r dataframe grouping

所以我的数据看起来像这样:

 Week        Total Amount        Person
   1            $5                 A
   1            $5                 B
   1            $4                 C
   1            $2                 D
   1            $1                 E
   2            $5                 A
   2            $1                 B
   2            $1                 H
   2            $3                 G
   2            $5                 C
   2            $5                 F

如何制作,以便每周显示前三名,并将所有其他金额加入"其他"?我想要它显示:

 Week        Total Amount        Person
   1            $5                 A
   1            $5                 B
   1            $4                 C
   1            $3                 Others
   2            $5                 A
   2            $5                 C
   2            $5                 F
   2            $5                 Others

请注意,不是前三名的其他金额总计为新的总金额,并且它占每周的随机行数(例如,第1周每个人的总金额为5,但第2周有6个,第3周可能是8或10,第4周可能是1,但我希望方程适用于每一行)

3 个答案:

答案 0 :(得分:2)

这很容易使用tidyverse。在名为df。

的数据框中说出来
library(tidyverse)

df.new <- df %>%
  group_by(Week) %>%
  arrange(`Total Amount`) %>%
  mutate(Person = ifelse(row_number() > 3, "Others", Person)) %>%
  group_by(Week, Person) %>%
  summarize(`Total Amount` = sum(`Total Amount`))

如果有&#34; $&#34;在列(它是一个字符串列)中,您首先需要转换它,然后才能使用汇总行。您可以使用parse_number()等函数来执行此操作。

答案 1 :(得分:1)

基础R

df$Person[ave(df$`Total Amount`, df$Week, FUN = function(x)
    order(x, decreasing = TRUE)) > 3] = "Others"
df2 = aggregate(df["Total Amount"], df[c("Week", "Person")], sum)
df2[order(df2$Week, df2$Person),]
#  Week Person Total Amount
#1    1      A            5
#3    1      B            5
#4    1      C            4
#7    1 Others            3
#2    2      A            5
#5    2      C            5
#6    2      F            5
#8    2 Others            5

数据

df = structure(list(Week = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L), `Total Amount` = c(5L, 5L, 4L, 2L, 1L, 5L, 1L, 1L, 3L, 5L, 
5L), Person = c("A", "B", "C", "D", "E", "A", "B", "H", "G", 
"C", "F")), .Names = c("Week", "Total Amount", "Person"), class = "data.frame",
row.names = c(NA, -11L))

答案 2 :(得分:0)

这是你可以做到的一种方式:

library(tidyverse)

df <- df %>% 
  group_by(Week) %>% 
  arrange(desc(Total_Amount), .by_group = TRUE) %>% 
  mutate(id = row_number()) %>% 
  mutate(Person = case_when(id > 3 ~ "Others",
                            TRUE ~ as.character(Person)))

然后删除$符号,这样我们就可以对Total_Amount

求和
df$Total_Amount <- as.numeric(gsub("\\$", "", df$Total_Amount))

最后,按小组对Total_Amount求和并添加$符号以恢复所有内容:

df %>% 
  group_by(Week, Person) %>% 
  summarise(Total_Amount = sum(Total_Amount)) %>% 
  mutate(Total_Amount = paste0("$", Total_Amount)) %>% 
  select(Week, Total_Amount, Person)

返回:

# A tibble: 8 x 3
# Groups:   Week [2]
   Week Total_Amount Person
  <int>        <chr>  <chr>
1     1           $5      A
2     1           $5      B
3     1           $4      C
4     1           $3 Others
5     2           $5      A
6     2           $5      C
7     2           $5      F
8     2           $5 Others