我在r
中有以下数据框Engine General Ladder.winch engine.phe subm.gear.box aux.engine pipeline.maintain pipeline pipe.line engine.mpd
1 12 22 2 4 2 4 5 6 7
等超过10000行。
现在,我想组合列并添加值以将列减少到更广泛的类别。例如,Engine,engine.phe,aux.engine,engine.mpd
应合并到Engine
类别和要添加的所有值中。同样pipeline.maintain,pipeline,pipe.line
要合并到Pipeline
并在General
类别下添加其余列。
所需的数据框将是
Engine Pipeline General
12 15 38
我怎样才能在r?
中这样做答案 0 :(得分:2)
您可以通过多种方式实现这一目标,这是一种更直接的方法
# Example data.frame
dtf <- structure(list(Engine = c(1, 0, 1),
General = c(12, 3, 15), Ladder.winch = c(22, 28, 26),
engine.phe = c(2, 1, 0), subm.gear.box = c(4, 4, 10),
aux.engine = c(2, 3, 1), pipeline.maintain = c(4, 5, 1),
pipeline = c(5, 5, 2), pipe.line = c(6, 8, 2), engine.mpd = c(7, 8, 19)),
.Names = c("Engine", "General", "Ladder.winch", "engine.phe",
"subm.gear.box", "aux.engine", "pipeline.maintain",
"pipeline", "pipe.line", "engine.mpd"),
row.names = c(NA, -3L), class = "data.frame")
with(dtf, data.frame(Engine=Engine+engine.phe+aux.engine+engine.mpd,
Pipeline=pipeline.maintain+pipeline+pipe.line,
General=General+Ladder.winch+subm.gear.box))
# Engine Pipeline General
# 1 12 15 38
# 2 12 18 35
# 3 21 5 51
# a more generalized and 'greppy' solution
cnames <- tolower(colnames(dtf))
data.frame(Engine=rowSums(dtf[, grep("eng", cnames)]),
Pipeline=rowSums(dtf[, grep("pip", cnames)]),
General=rowSums(dtf[, !grepl("eng|pip", cnames)]))
答案 1 :(得分:1)
最好以长格式存储数据。因此,我的提议将解决您的问题如下:
1 - 以长格式获取数据
library(reshape2)
dfl <- melt(df)
2 - 创建'引擎'和'管道'向量
e_vec <- c("Engine","engine.phe","aux.engine","engine.mpd")
p_vec <- c("pipeline.maintain","pipeline","pipe.line")
3 - 创建一个类别列
dfl$newcat <- c("general","engine","pipeline")[1 + dfl$variable %in% e_vec + 2*(dfl$variable %in% p_vec)]
结果:
> dfl
variable value newcat
1 Engine 1 engine
2 General 12 general
3 Ladder.winch 22 general
4 engine.phe 2 engine
5 subm.gear.box 4 general
6 aux.engine 2 engine
7 pipeline.maintain 4 pipeline
8 pipeline 5 pipeline
9 pipe.line 6 pipeline
10 engine.mpd 7 engine
现在您可以使用aggregate
来获得最终结果:
> aggregate(value ~ newcat, dfl, sum)
newcat value
1 engine 12
2 general 38
3 pipeline 15
答案 2 :(得分:1)
通过从列的names
中提取相关字词,然后使用tapply
获取sum
,可以选择此选项。 str_extract_all
会返回list
(&#39; lst&#39;)。用&#39; GENERAL&#39;替换那些长度为零的元素,然后,使用按功能分组,即tapply
,unlist
数据集,并使用分组变量,即复制&#39 ; LST&#39;以及&#39; df1&#39;的row
得到sum
library(stringr)
lst <- str_extract_all(toupper(sub("(pipe)\\.", "\\1", names(df1))),
"ENGINE|PIPELINE|GENERAL")
lst[lengths(lst)==0] <- "GENERAL"
t(tapply(unlist(df1), list(unlist(lst)[col(df1)], row(df1)), FUN = sum))
# ENGINE GENERAL PIPELINE
#1 12 38 15
答案 3 :(得分:1)
myfactors = ifelse(grepl("engine", names(df), ignore.case = TRUE), "Engine",
ifelse(grepl("pipe|pipeline", names(df), ignore.case = TRUE), "Pipeline",
"General"))
data.frame(lapply(split.default(df, myfactors), rowSums))
# Engine General Pipeline
#1 12 38 15
#2 12 35 18
#3 21 51 5
df
是来自this answer