Question

我有一个包含数字和字符串值的数据框，例如：

 mydf <- data.frame(id = c(1, 2, 1, 2, 3, 4),
               value = c(32, 12, 43, 6, 50, 20),
               text = c('A', 'B', 'A', 'B', 'C', 'D'))

id变量的值始终对应text变量，例如，id == 1始终为text == 'A'。

现在，我想通过id（或text来总结这个数据框，因为它是一样的）：

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value))

这很好用，但我还需要text变量，因为我不想进行文本分析。

但是，当我将text添加到dplyr管道时：

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
  text = text)

我收到以下错误：

错误：期待单个值

由于text的{{1}}始终相同，是否可以将其附加到汇总数据框？

Answer 1

summarize函数需要在输入中应用某些函数，因此我们可以将text保留在id之外并与group_by中的first保持一致，或者使用{{ 1}} summarize中的函数：

# text should be in group_by to show up in result
mydf %>%
  group_by(id, text) %>%
  summarize(mean_value = mean(value))

# or within summarise use first function, to take the first value when grouped
mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
            text = first(text))

Answer 2

而不是汇总，这会使你的df成为只有两列的数据框，而是使用mutate以便你可以保留其他变量。

mydf %>%
group_by(id) %>%
mutate(mean_value = mean(value))

dplyr按字符串

2 个答案: