Question

我在R中有一个看起来像的表（下面只是一个示例）：

|       | 15 | 17 | 18 | 22 | 25 | 26 | 27 | 29 | 
|-------|----|----|----|----|----|----|----|----|
| 10000 | 1  | 2  | 1  | 2  | 4  | 3  | 5  | 2  |
| 20000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 30000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 40000 | 0  | 0  | 0  | 1  | 2  | 3  | 6  | 3  |
| 50000 | 0  | 0  | 0  | 0  | 0  | 0  | 1  | 1  |
| 60000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |

行是收入水平，列是年龄水平。我基本上创建了这个表，通过卡方检验来确定年龄是否与收入相关。表中的数字是出现次数，例如我的数据集中有2个人年龄为17岁，收入为10000。

年龄和收入水平类型＆＃34; num＆＃34;在R中是连续的。

我想基本上结合年龄列，以便我得到一张桌子，每个人的收入都是10k，年龄在15-25岁之间，年龄在25-35岁之间等等。所以我最终的列数要少得多。< / p>

另请注意，colnames（tbl）=＆＃34; 15＆＃34;，＆＃34; 17＆＃34;，＆＃34; 18＆＃34;， not ＆＃34;年龄＆＃34; - 我还没有为我的列和行定义一个总体名称。

我注意到this answer做了类似但不确定如何应用它，因为我没有为我的列命名，例如＆＃34; MPG＆＃34; （在链接的情况下）。

有什么想法吗？

Answer 1

在这里制作我自己的矩阵，但是也应该适用于df。

mat <- matrix(sample(1:10,8500,replace = TRUE),ncol=85)
colnames(mat) <- 15:99
levs <- cut(as.numeric(colnames(mat)),seq(15,105,10),right = FALSE)
res <- sapply(as.character(unique(levs)),function(x)rowSums(mat[,levs==x]))

编辑：如果你想要与mat相同的colnames，但根据类别计数，另外do：

res <- res[,levs] # expands the res df to one category count col pr. original col in mat.
colnames(res) <- colnames(mat) # renames cols to reflect input matrix mat.

根据年龄范围

1 个答案: