Question

我有一个数据框，其中包含一个 Location 列和一个 County 列，其中显示了一组县的不同位置的数据。我正在按县列分组进行另一次计算，但我想保留一种方法来查看每个县中包含哪些位置。这可能吗？

以下是原始数据的示例：

location   county   x   y
hend       hender   2   10
alam       alam     0   5
alex       alam     4   3
alleg      allegy   6   1
ann        hender   9   0

这是我也改变的：

df <- df %>%
  group_by(county) %>%
  summarise(total = sum(x + y))

county   total
hender   17
alam     12
allegy   7

同样，不确定这是否可行，但我希望第三列（我们称之为 allloc）显示每个县的位置，如果可能，用逗号分隔。像这样：

county   total   allloc
hender   17      hend, ann
alam     12      alam, alex
allegy   7       alleg

我尝试使用汇总和粘贴、变异和粘贴以及合并，但都没有成功。

df <- df %>%
  group_by(county) %>%
  summarise(allloc = paste(location))

df <- df %>%
  group_by(county) %>%
  mutate(allloc = paste(location))

df <- df %>%
  group_by(county) %>%
  mutate(allloc = coalesce(df$location))

有什么想法吗？

（最后但并非最不重要的是，这里有一些可重现的代码）：

df <- data.frame(location = c("hend", "alam", "alex", "alleg", "ann"), county = c("hender", "alam", "alam", "allegy", "hender"), x = c(2, 0 , 4, 6, 9), y = c(10, 5, 3, 1, 0))

Answer 1

虽然 toString 对汇总信息很有用，但如果您需要重新提取不同的位置，我觉得将数据存储为更好（好吧，更一般）列表列而不是字符串。（后者需要重新解析它们以拆分它们，这可以使用 strsplit 简单，直到在实际数据中嵌入逗号或标记拆分器。）

results <- df %>%
  group_by(county) %>%
  summarise(total = sum(x + y), allloc = list(location) )
results
# # A tibble: 3 x 3
#   county total allloc   
#   <chr>  <int> <list>   
# 1 alam      12 <chr [2]>
# 2 allegy     7 <chr [1]>
# 3 hender    21 <chr [2]>

您可以通过 str 看到幕后发生的事情：

str(results)
# tibble [3 x 3] (S3: tbl_df/tbl/data.frame)
#  $ county: chr [1:3] "alam" "allegy" "hender"
#  $ total : int [1:3] 12 7 21
#  $ allloc:List of 3
#   ..$ : chr [1:2] "alam" "alex"
#   ..$ : chr "alleg"
#   ..$ : chr [1:2] "hend" "ann"

显示 allloc 正在显示一个列表，长度为 3，带有可变的字符串列表。

这将是有用/合理的时间：

您对 location 数据进行了后续处理，希望将其保留在某个长度的向量中（对于每个 county）；或
您需要重构/延长数据，这实际上只是前一个项目符号的一个特例；

不需要的时候：

当您创建 allloc 的原因纯粹是为了视觉报告时；和
原始数据从未嵌入逗号。

Answer 2

您可以在 summarise 中创建多个列：

library(dplyr)

result <- df %>%
           group_by(county) %>%
           summarise(total = sum(x + y), 
                     allloc = toString(location))
                     #Same as :
                     #allloc = paste0(location, collapse = ','))
result

#  county total allloc    
#  <chr>  <dbl> <chr>     
#1 alam      12 alam, alex
#2 allegy     7 alleg     
#3 hender    21 hend, ann

有什么方法可以让变量按group_by分组？

2 个答案: