使用GermanCredit
库中的caret
数据集。
library("caret")
data(GermanCredit)
稍微过滤了一下
credit.all <- GermanCredit[,c(10, 1:9, 11:13, 16:19)]
attach(credit.all)
names(credit.all)
我们有这些名字
[1] "Class" "Duration"
[3] "Amount" "InstallmentRatePercentage"
[5] "ResidenceDuration" "Age"
[7] "NumberExistingCredits" "NumberPeopleMaintenance"
[9] "Telephone" "ForeignWorker"
[11] "CheckingAccountStatus.lt.0" "CheckingAccountStatus.0.to.200"
[13] "CheckingAccountStatus.gt.200" "CreditHistory.ThisBank.AllPaid"
[15] "CreditHistory.PaidDuly" "CreditHistory.Delay"
[17] "CreditHistory.Critical"
我需要做的是总结其中两列,我知道如何在SQL中做这样的事情。
SELECT
Class
, SUM(CASE WHEN `CreditHistory.Critical` = 1 THEN 1 ELSE 0 END) AS Critical
, SUM(CASE WHEN `CreditHistory.Critical` = 0 THEN 1 ELSE 0 END) AS NotCritical
, SUM(CASE WHEN `CreditHistory.Critical` = 1 THEN 1 ELSE 0 END) / COUNT(*) AS PctCritical
FROM `credit.all`
GROUP BY
Class
然而,我正努力在R中站稳脚跟,使用书籍和谷歌,似乎我应该使用reshape2
melt
和dcast
来实现这样的目标。我试过的基本上是这个变种:
library(reshape2)
credit.melted <- melt(credit.all[,c(1,17)], ID=c("name", "Class"))
dcast(credit.melted, Class~CreditHistory.Critical, nrow, fill=0)
但是我对这些功能的所有尝试都产生了过于神秘和太常见的错误,无法理解我做错了什么。
Error in vapply(indices, fun, .default) : values must be length 1,
but FUN(X[[1]]) result is length 0
有时我对函数调用的随机排列会产生稍微不同的错误输出,但没有任何东西可以指向正确的方向。
问题:如何使用R?
执行类似于SQL结果的轮转摘要答案 0 :(得分:2)
我不认为这是一个支点。您不是在SQL中尝试使用pivot命令。您可以使用library(dplyr)
credit.all %>%
group_by(Class) %>%
summarize(Critical = sum(CreditHistory.Critical == 1),
NotCritical = sum(CreditHistory.Critical == 0),
PctCritical = mean(CreditHistory.Critical == 1))
# # A tibble: 2 x 4
# Class Critical NotCritical PctCritical
# <fct> <int> <int> <dbl>
# 1 Bad 50 250 0.167
# 2 Good 243 457 0.347
来执行与SQL完全相同的方法:
== 1
因为它是一个二进制列,所以credit.all %>%
group_by(Class) %>%
summarize(Critical = sum(CreditHistory.Critical),
NotCritical = n() - Critical,
PctCritical = Critical / n())
并不是必需的,但是我把它留在了因为(a)它与你的SQL代码更相似,(b)如果有其他值,但你想要计数为1,这将是这样做的方式。但是,您可以更简单地得到相同的结果:
melt
如果你真的想要一个支点,我们可以走那条路,它看起来不那么简单。您的数据已经是长格式,因此我们不需要pivot = dcast(Class ~ CreditHistory.Critical, data = credit.all)
pivot
# Using CreditHistory.Critical as value column: use value.var to override.
# Aggregation function missing: defaulting to length
# Class 0 1
# 1 Bad 250 50
# 2 Good 457 243
,我们可以直接投放:
names(pivot)[2:3] = c("NotCritical", "Critical")
pivot$PctCritical = with(pivot, Critical / (Critical + NotCritical)
然后,您可以重命名列并计算百分比:
request = requests.get("http://api.meetup.com/2/members?fields=birthday",params=params)