如何计算数据子集的比例?

时间:2014-09-22 19:47:50

标签: r statistics rstudio data-analysis

我在下面有一些我需要帮助的代码。我要求我的数据科学课程“美国人的财务满意度是否受到上一年度S& P500收益/损失的影响?”。我试图绘制一个图表,其中观察量是满足或更多或更少满足,对整个人口(一个比例)作为y轴,"PercentChange"作为x轴。我将整个代码进一步向下发布,以防有​​必要了解我正在尝试做什么。所有这些观察结果都在同一个表"finalResults"中,它们列在一个分类的变量列下,名为"FinancialSatisfaction"。我不确定从哪里开始,但我遇到的一个大问题是如何根据最终结果表中的"PercentChange"来计算比例。正下面就是我曾尝试过的,但它已经过时了。我需要按年度过滤满意度比例,因为x轴将是每年的百分比变化。非常感谢帮助,我对R的了解不足以解决这个问题。

satisfied <- subset(finalResults, FinancialSatisfaction == "Satisfied")
moreorless <- subset(finalResults, FinancialSatisfaction == "More Or Less")
notatall <- subset(finalResults, FinancialSatisfaction == "Not At All")

myProportion = (satisfied + moreorless) / 29205

完整代码:

require(Quandl)
require(lubridate)
require(zoo)
require(xts)

myGSS <- load(url("http://bit.ly/dasi_gss_data"))

year <- gss$year
finSat <- gss$satfin

relativeTable <- data.frame(year, finSat)
relativeTable <- subset(relativeTable, year > "1988" & !is.na(finSat))


spReturns <- Quandl("SANDP/ANNRETS", trim_start="1970-01-11", 
                    trim_end="2012-12-31", authcode="nwy3a_Gmd7TSS9fVirxT", 
                    collapse="annual")

percentChange <- spReturns$"Total Return Change"

spReturns$"Year Ending" <- format((spReturns$"Year Ending"), "%Y")
spReturns$"Year Ending" <- as.numeric(spReturns$"Year Ending")
spReturns$"Year Ending" <- spReturns[,1] + 1 #the following year

combined <- merge(relativeTable, spReturns, by.x = "year", by.y = "Year Ending")
names(combined)[6] <- "percentChange"

finalResults <- data.frame(combined$year, combined$finSat, combined$percentChange)
names(finalResults)[1] <- "Year"
names(finalResults)[2] <- "FinancialSatisfaction"
names(finalResults)[3] <- "PercentChange"
finalResults$PercentChange <- finalResults$PercentChange * 100

satisfied <- subset(finalResults, FinancialSatisfaction == "Satisfied")
moreorless <- subset(finalResults, FinancialSatisfaction == "More Or Less")
notatall <- subset(finalResults, FinancialSatisfaction == "Not At All")

myProportion <- (satisfied + moreorless) / 29205

2 个答案:

答案 0 :(得分:0)

在您的代码中,

myProportion = (satisfied + moreorless) / 29205

satisfiedmoreorlessdata.frame s,因此您的结果也是DF;你可能想要像

这样的东西
myProporition <- mean(finalResults$FinancialSatisfaction == "Satisfied" | finalResults$FinancialSatisfaction == "More or Less")

答案 1 :(得分:0)

此解决方案可能有所帮助:

satisfied <- subset(finalResults, FinancialSatisfaction == "Satisfied")
moreorless <- subset(finalResults, FinancialSatisfaction == "More Or Less")
notatall <- subset(finalResults, FinancialSatisfaction == "Not At All")

myProportion = (nrow(satisfied) + nrow(moreorless)) / nrow(finalResults)