我已经浏览了其他几个相关问题,但还没有真正找到符合我情况的东西。
我有一列,指示将要累加到新列中的列数。
DEPENDENCY == INDEP
,则NET_AGI = IND_AGI
DEPENDENCY == DEP
,则NET_AGI = PRO_AGI + IND_AGI
NET_AGI = PRO_AGI
DEPENDENCY IND_AGI PRO_AGI NET_AGI <- NET_AGI will be the summed column
INDEP 0049995 - 0049995
DEP 0000500 0090500 0091000
DEP 0009000 0121095 0130950
DEP - 0375001 0375001
INDEP 0123456 - 0123456
DEP 0012070 1023030 1035100
...
做到这一点的最佳方法是什么?
答案 0 :(得分:1)
可能最快(也是最简单的一种)方法是
df$NET_AGI = df$PRO_AGI
df[df$DEPENDENCY == 'INDEP', 'NET_AGI'] = df[df$DEPENDENCY == 'INDEP', 'IND_AGI']
df[df$DEPENDENCY == 'DEP', 'NET_AGI'] = rowSums(df[df$DEPENDENCY == 'DEP', c('PRO_AGI', 'IND_AGI')], na.rm = T)
如果您想按原样读取数据集并按原样进行操作,请使用以下内容。请注意,这假设不需要七个字符格式。
df <- read.table(text="DEPENDENCY IND_AGI PRO_AGI NET_AGI
INDEP 0049995 - 0049995
DEP 0000500 0090500 0091000
DEP 0009000 0121095 0130950
DEP - 0375001 0375001
INDEP 0123456 - 0123456
DEP 0012070 1023030 1035100",
stringsAsFactors = F, header=T, na.strings = c('NA', '-'))
答案 1 :(得分:1)
library(dplyr)
df1 %>%
mutate(NET_AGI_2 = case_when (DEPENDENCY == "DEP" ~ as.character(sprintf('%07d', rowSums(
cbind(as.numeric(IND_AGI),
as.numeric(PRO_AGI)) ,
na.rm = T))),
DEPENDENCY == "INDEP" ~ IND_AGI,
TRUE ~ PRO_AGI))
#> DEPENDENCY IND_AGI PRO_AGI NET_AGI NET_AGI_2
#> 1 INDEP 0049995 - 49995 0049995
#> 2 DEP 0000500 0090500 91000 0091000
#> 3 DEP 0009000 0121095 130950 0130095
#> 4 DEP - 0375001 375001 0375001
#> 5 INDEP 0123456 - 123456 0123456
#> 6 DEP 0012070 1023030 1035100 1035100
read.table(text="DEPENDENCY IND_AGI PRO_AGI NET_AGI
INDEP 0049995 - 0049995
DEP 0000500 0090500 0091000
DEP 0009000 0121095 0130950
DEP - 0375001 0375001
INDEP 0123456 - 0123456
DEP 0012070 1023030 1035100",stringsAsFactors = F, header=T) -> df1