我有一个看起来像这样的数据集。
bankname bankid year totass cash bond loans
Bank A 1 1881 244789 7250 20218 29513
Bank B 2 1881 195755 10243 185151 2800
Bank C 3 1881 107736 13357 177612 NA
Bank D 4 1881 170600 35000 20000 5000
Bank E 5 1881 3200000 351266 314012 NA
我想根据银行资产负债表计算一些比率。我希望数据集看起来像这样
bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset
Bank A 1 1881 2447890 7250 202100 951300 0.002 0.082 0.388
Bank B 2 1881 195755 10243 185151 2800 0.052 0.945 0.014
Bank C 3 1881 107736 13357 177612 NA 0.123 1.648585431 NA
Bank D 4 1881 170600 35000 20000 5000 0.205 0.117 0.029
Bank E 5 1881 32000000 351266 314012 NA 0.0109 0.009 NA
以下是复制数据的代码
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2, 3, 4, 5)
year<- c( 1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)
首先,我在资产负债表中删除了NAs。
cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0
然后我计算比率
library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)
但是,我不是一行一行地计算所有这些比率,而是想要一次性创建这样做。在Stata,我会做
foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}
我该怎么做?
答案 0 :(得分:36)
发生了变化。我们一直在funs()
(.funs
)中使用funs(name = f(.)
。但这已经改变了(dplyr 0.8.0以上)。现在我们使用funs
(list
)代替list(name = ~f(.))
。请参阅以下新示例。
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)
由于我回答了这个问题,我意识到有些SO用户一直在检查这个问题。从那以后,dplyr包已经改变了。因此,我留下以下更新。我希望这可以帮助一些R用户学习如何使用mutate_at()
。
mutate_each()
现已弃用。您想要使用mutate_at()
。您可以在.vars
中指定要应用功能的列。一种方法是使用vars()
。另一种方法是使用包含列名的字符向量,您希望在.fun
中应用自定义函数。另一种是指定带有数字的列(例如,在这种情况下为5:7)。请注意,如果您使用group_by()
列,则需要更改列位置的数量。看看this question。
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)
# bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
我故意将toAsset
提供给.fun
中的自定义函数,因为这有助于我安排新的列名。以前,我使用rename()
。但我认为在目前的方法中使用gsub()
清理列名要容易得多。如果上述结果保存为out
,您需要运行以下代码才能删除列名称中的_
。
names(out) <- gsub(names(out), pattern = "_", replacement = "")
我认为你可以使用dplyr以这种方式保存一些输入。缺点是你要覆盖现金,债券和贷款。
bankdata %>%
group_by(bankname) %>%
mutate_each(funs(whatever = ./totass), cash:loans)
# bankname bankid year totass cash bond loans
#1 Bank A 1 1881 244789 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 0.01097706 0.009812875 NA
如果您更喜欢预期的结果,我认为有必要打字。重命名部分似乎是你必须做的事情。
bankdata %>%
group_by(bankname) %>%
summarise_each(funs(whatever = ./totass), cash:loans) %>%
rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
ana %>%
merge(bankdata,., by = "bankname")
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
答案 1 :(得分:3)
Apply
和cbind
cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass))
names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest’)
> bankdata
bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest
1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
答案 2 :(得分:2)
这是一个data.table
解决方案。
library(data.table)
setDT(bankdata)
bankdata[, paste0(names(bankdata)[5:7], "toAsset") :=
lapply(.SD, function(x) x/totass), .SDcols=5:7]
bankdata
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
# 1: Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
# 2: Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
# 3: Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000000
# 4: Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
# 5: Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000
答案 3 :(得分:1)
这是dplyr
的一个重大缺点:就我所知,没有直接的方式以编程方式使用它而不是交互式地使用它没有某种&#34; hack&#34 ;喜欢可悲的eval(parse(text=foo))
成语。
最简单的方法与Stata方法相同,但字符串操作在R中比在Stata(或任何其他脚本语言中)更为冗长。
for (x in c("cash", "bond", "loans")) {
bankdata[sprintf("%stoAsset", x)] <- bankdata[x] / bankdata$totass # or, equivalently, bankdata["totass"] for a consistent "look"
## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you.
}
为了使整个事物更像Stata,你可以像within
一样包装整个事物:
bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) {
assign(x, get(x) / totass)
})
但这需要对get
和assign
函数进行一些黑客攻击,这些函数一般不会安全使用,尽管在您的情况下它可能不是什么大问题。例如,我不推荐使用dplyr
尝试类似的技巧,因为dplyr
滥用了R的非标准评估功能,而且它可能比它更麻烦价值。要获得速度更快且可能更高级的解决方案,请查看data.table
包(我认为)允许您使用类似Stata的循环语法,但使用dplyr
- 就像速度一样。查看CRAN上的包装插图。
另外,你真的,确定要将NA
条目重新分配给0吗?
答案 4 :(得分:0)
你可能会比必要时更难。试试看,看看它是否能满足您的需求。
bankdata$CashtoAsset <- bankdata$cash / bankdata$totass
bankdata$BondtoAsset <- bankdata$bond / bankdata$totass
bankdata$loantoAsset <- bankdata$loans / bankdata$totass
bankdata
产生这个:
bankname bankid year totass cash bond loans CashtoAsset BondtoAsset loantoAsset
1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
3 Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000
4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
5 Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000
这应该让你开始朝着正确的方向前进。
答案 5 :(得分:0)
尝试:
for(i in 5:7){
bankdata[,(i+3)] = bankdata[,i]/bankdata[,4]
}
names(bankdata)[(5:7)+3] = paste0(names(bankdata)[5:7], 'toAssest')
输出:
bankdata
bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest
1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
3 Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000000
4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
5 Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000