Question

我有8个年龄类别，每个类别都有自己的列（即，residents_under_5，residents_6_to_12等）。对于该家庭中该特定年龄类别的人数，每列的值都介于0到3之间。我想要一个新的列，用它可以在直方图上绘制人口年龄的总分布。因此，我想到的一列包含66行residents_under_5行，32行residents_6_to_12等，以此类推。

我的数据如下：

我想要的是显示的列e：

e
a
a
a
a
b
b
b
b
b
c
c
c
d
d
d

其他列中的发生总数。

我尝试用sum(residents_under_5)声明新的列，但这将给我1行66（作为该类别的总和）。我无法用这样的列绘制直方图。我希望有人能弄清楚！

这是相关列的dput（）

residents_under_5 = c(0, 0, 0, 1, 1, 2), 
residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
        residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
residents_19_to_24 = c(0, 
        0, 0, 0, 0, 0), 
residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
       residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
residents_50_to_64 = c(0, 
        1, 0, 0, 0, 0), 
residents_65_and_older = c(2, 0, 0, 0, 1, 
        0)

Answer 1

您可以unlist数据框并使用table计算频率，然后使用letters重复rep。

rep(letters[seq_len(ncol(df))], colSums(df))

数据

df <- data.frame(residents_under_5 = c(0, 0, 0, 1, 1, 2), 
                 residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
                 residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
                 residents_19_to_24 = c(0, 0, 0, 0, 0, 0), 
                 residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
                 residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
                 residents_50_to_64 = c(0, 1, 0, 0, 0, 0), 
                 residents_65_and_older = c(2, 0, 0, 0, 1, 0))

Answer 2

tidyverse中的一个选项是将所有带有sum，summarise_all的列中的gather设置为'long'格式，将uncount设置为'value '列

library(tidyverse)
df1 %>% 
   summarise_all(sum) %>%
   gather %>% 
   uncount(value)

数据

df1 <- structure(list(a = 0:3, b = c(3L, 3L, 0L, 1L), c = c(2L, 2L, 
2L, 0L), d = c(1L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
  -4L))

如何在数据表中添加一列，以显示其他多个列的值之和？

2 个答案:

数据