我正在尝试在我的数据集中创建一个新行,该行对多行的值求和。我的原始数据集有点麻烦,看起来像这样:
TranID PT VegType Int1 Int2 Int3 Int4 Int5 Int6 Int7 Int8 Int9 Int10
1 1M Shrub 0 0 0 0 5 7 0 0 0 0
1 1M Sapling 1 0 2 1 0 0 0 0 5 0
1 1M Vine 0 0 0 0 1 2 0 0 0 0
1 1M Grass 1 1 1 0 0 0 0 0 0 0
1 1M Forb 0 1 0 0 0 0 0 0 0 0
1 2M Shrub 0 0 0 0 5 7 0 0 0 0
1 2M Sapling 1 0 2 1 0 0 0 0 5 0
1 2M Vine 0 0 0 0 1 2 0 0 0 0
1 2M Grass 1 1 1 0 0 0 0 0 0 0
1 2M Forb 0 1 0 0 0 0 0 0 0 0
1 3M Shrub 0 0 0 0 5 7 0 0 0 0
1 3M Sapling 1 0 2 1 0 0 0 0 5 0
1 3M Vine 0 0 0 0 1 2 0 0 0 0
1 3M Grass 1 1 1 0 0 0 0 0 0 0
1 3M Forb 0 1 0 0 0 0 0 0 0 0
1 4M Shrub 0 0 0 0 5 7 0 0 0 0
1 4M Sapling 1 0 2 1 0 0 0 0 5 0
1 4M Vine 0 0 0 0 1 2 0 0 0 0
1 4M Grass 1 1 1 0 0 0 0 0 0 0
1 4M Forb 0 1 0 0 0 0 0 0 0 0
1 5M Shrub 0 0 0 0 5 7 0 0 0 0
1 5M Sapling 1 0 2 1 0 0 0 0 5 0
1 5M Vine 0 0 0 0 1 2 0 0 0 0
1 5M Grass 1 1 1 0 0 0 0 0 0 0
1 5M Forb 0 1 0 0 0 0 0 0 0 0
第一列标题是横断面ID,沿着每个横断面是5个1米间隔点,我们记录了10个高度间隔内的茎数(Int列标题)。我有兴趣减少前3种蔬菜类型(灌木,树苗和藤蔓),将它们的值汇总成一行名为“WoodyVeg”。有几百个横断面,我想在横断面中为每个PT创建这个新行:
A 1M WoodyVeg 1 0 2 1 6 9 0 0 5 0
A 1M Grass 1 1 1 0 0 0 0 0 0 0
A 1M Forb 0 1 0 0 0 0 0 0 0 0
我尝试使用聚合函数但无法生成正确的结果。我想我会因为有两个不同的因素(TranID和PT)而被抛弃。有没有办法用聚合或其他函数/ R包来做到这一点?
答案 0 :(得分:1)
您可以使用data.table
方法:
library(data.table)
dt1 = setDT(df)[,as.list(c(VegType='WoodyVeg',
colSums(.SD[!VegType %in% c('Grass','Forb'), -1, with=F])))
,.(TranID, PT)]
dt2 = setDT(df)[, .SD[VegType %in% c('Grass','Forb')], .(TranID, PT)]
rbindlist(list(dt1, dt2))
# TranID PT VegType Int1 Int2 Int3 Int4 Int5 Int6 Int7 Int8 Int9 Int10
# 1: 1 1M WoodyVeg 1 0 2 1 6 9 0 0 5 0
# 2: 1 2M WoodyVeg 1 0 2 1 6 9 0 0 5 0
# 3: 1 3M WoodyVeg 1 0 2 1 6 9 0 0 5 0
# 4: 1 4M WoodyVeg 1 0 2 1 6 9 0 0 5 0
# 5: 1 5M WoodyVeg 1 0 2 1 6 9 0 0 5 0
# 6: 1 1M Grass 1 1 1 0 0 0 0 0 0 0
# 7: 1 1M Forb 0 1 0 0 0 0 0 0 0 0
# 8: 1 2M Grass 1 1 1 0 0 0 0 0 0 0
# 9: 1 2M Forb 0 1 0 0 0 0 0 0 0 0
#10: 1 3M Grass 1 1 1 0 0 0 0 0 0 0
#11: 1 3M Forb 0 1 0 0 0 0 0 0 0 0
#12: 1 4M Grass 1 1 1 0 0 0 0 0 0 0
#13: 1 4M Forb 0 1 0 0 0 0 0 0 0 0
#14: 1 5M Grass 1 1 1 0 0 0 0 0 0 0
#15: 1 5M Forb 0 1 0 0 0 0 0 0 0 0
答案 1 :(得分:1)
library(dplyr)
data %>%
mutate(VegCategory =
ifelse(VegType %in% c("Shrub", "Sapling", "Vine"),
"WoodyVeg",
VegType) ) %>%
group_by(TranID, PT, VegCategory) %>%
summarise_each(funs(sum))
答案 2 :(得分:0)
我的回答使用aggregate()
,不需要任何其他包。
用数据框名称替换df。
df$VegType <- factor(df$VegType) levels(df$VegType) <- list(WoodyVeg=c("Shrub", "Sapling", "Vine"), Forb=c("Forb"),Grass=c("Grass")) df1<-aggregate(df[,4:13],by=list(df$TranID,df$PT,df$VegType),FUN=sum) names(df1)<-names(df) df1[with(df1, order(df1$PT)),]
TranID PT VegType Int1 Int2 Int3 Int4 Int5 Int6 Int7 Int8 Int9 Int10
1 1M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 1M Forb 0 1 0 0 0 0 0 0 0 0
1 1M Grass 1 1 1 0 0 0 0 0 0 0
1 2M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 2M Forb 0 1 0 0 0 0 0 0 0 0
1 2M Grass 1 1 1 0 0 0 0 0 0 0
1 3M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 3M Forb 0 1 0 0 0 0 0 0 0 0
1 3M Grass 1 1 1 0 0 0 0 0 0 0
1 4M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 4M Forb 0 1 0 0 0 0 0 0 0 0
1 4M Grass 1 1 1 0 0 0 0 0 0 0
1 5M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 5M Forb 0 1 0 0 0 0 0 0 0 0
1 5M Grass 1 1 1 0 0 0 0 0 0 0
答案 3 :(得分:0)
模仿@ bramtayl的dplyr方法......
library(data.table)
DT[, copy(.SD)[1:3, VegType := "WoodyVeg"][, lapply(.SD,sum), by=VegType], by=.(TranID,PT)]
给出了
TranID PT VegType Int1 Int2 Int3 Int4 Int5 Int6 Int7 Int8 Int9 Int10
1: 1 1M WoodyVeg 1 0 2 1 6 9 0 0 5 0
2: 1 1M Grass 1 1 1 0 0 0 0 0 0 0
3: 1 1M Forb 0 1 0 0 0 0 0 0 0 0
4: 1 2M WoodyVeg 1 0 2 1 6 9 0 0 5 0
5: 1 2M Grass 1 1 1 0 0 0 0 0 0 0
6: 1 2M Forb 0 1 0 0 0 0 0 0 0 0
7: 1 3M WoodyVeg 1 0 2 1 6 9 0 0 5 0
8: 1 3M Grass 1 1 1 0 0 0 0 0 0 0
9: 1 3M Forb 0 1 0 0 0 0 0 0 0 0
10: 1 4M WoodyVeg 1 0 2 1 6 9 0 0 5 0
11: 1 4M Grass 1 1 1 0 0 0 0 0 0 0
12: 1 4M Forb 0 1 0 0 0 0 0 0 0 0
13: 1 5M WoodyVeg 1 0 2 1 6 9 0 0 5 0
14: 1 5M Grass 1 1 1 0 0 0 0 0 0 0
15: 1 5M Forb 0 1 0 0 0 0 0 0 0 0
或者,在没有discouraged as.list
and colSums
的情况下重复上校的data.table答案:
DT[, rbind(
.SD[1:3, c( list(VegType="WoodyVeg"), lapply(.SD,sum) ), .SDcols=!"VegType"],
.SD[-(1:3)]
), by=.(TranID,PT)]