我正在查看加拿大安大略省的人口普查数据,并且某些列具有相同的列名(它们具有相同的名称,因为它们代表了普查区域的不同细分)。我想对具有相同列名但遇到麻烦的任何列逐行求和。在我的样本数据中,只有重复的列名称,但是在实际数据中,有几列具有相同的名称。 R中有矢量化的方法可以做到这一点吗?
TORONTO HALTON PEEL YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA
20855 4011 11178 8138 996 739 3835 305 2923
23281 3997 11770 8417 961 684 4095 343 2970
24130 3900 11810 8306 972 732 4168 334 2985
TORONTO HALTON PEEL YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA
39924 7863 21415 15714 1947 1428 7320 646 5675
44357 7820 22340 16261 1861 1369 7755 697 5775
46016 7679 22577 16260 1971 1447 7883 717 5868
我尝试用ifelse语句来尝试它,但是没有运气。像这样的伪代码:
# where i is the column name
for every column with name i(sum rows of each column with name == i)
不胜感激!
答案 0 :(得分:6)
我们可以根据数据集的split
names
将该数据集rowSums
应用于同名数据集的list
do.call(cbind, lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
# BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA PEEL TORONTO YORK
#[1,] 2943 2167 11874 11155 951 8598 32593 60779 23852
#[2,] 2822 2053 11817 11850 1040 8745 34110 67638 24678
#[3,] 2943 2179 11579 12051 1051 8853 34387 70146 24566
或者如@thelatemail所述,如果我们需要data.frame
输出,请用list
将data.frame
输出包裹起来
data.frame(lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
或使用tidyverse
library(tidyverse)
dfN %>%
split.default(names(.)) %>%
map_df(reduce, `+`)
# A tibble: 3 x 9
# BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA PEEL TORONTO YORK
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 2943 2167 11874 11155 951 8598 32593 60779 23852
#2 2822 2053 11817 11850 1040 8745 34110 67638 24678
#3 2943 2179 11579 12051 1051 8853 34387 70146 24566
dfN <- structure(list(TORONTO = c(20855L, 23281L, 24130L), HALTON = c(4011L,
3997L, 3900L), PEEL = c(11178L, 11770L, 11810L), YORK = c(8138L,
8417L, 8306L), BRANT = c(996L, 961L, 972L), HALDIMAND.NORFOLK = c(739L,
684L, 732L), HAMILTON = c(3835L, 4095L, 4168L), MUSKOKA = c(305L,
343L, 334L), NIAGARA = c(2923L, 2970L, 2985L), TORONTO = c(39924L,
44357L, 46016L), HALTON = c(7863L, 7820L, 7679L), PEEL = c(21415L,
22340L, 22577L), YORK = c(15714L, 16261L, 16260L), BRANT = c(1947L,
1861L, 1971L), HALDIMAND.NORFOLK = c(1428L, 1369L, 1447L), HAMILTON = c(7320L,
7755L, 7883L), MUSKOKA = c(646L, 697L, 717L), NIAGARA = c(5675L,
5775L, 5868L)), class = "data.frame", row.names = c(NA, -3L))