“前20%的人赚80%的钱”......这种结果使用R

时间:2013-09-07 11:49:38

标签: r

让我举个例子:

person | salary
----------------
1      | 30'000
2      | 10'000
3      | 15'000
4      | 25'000
5      | 80'000
6      | 56'000
...    | ...

获得此结果的步骤是订购工资,然后创建一个新表,从开始到相应的行分配行/人的份额,以及从开始到工资的总和的份额各行(总工资)。

然后我们必须为人们选择最接近20%的行,我们知道他们赚了多少钱。

这是一个非常标准的问题 - 但由于我不知道如何口头提及它,我不能谷歌。

所以我很感激,如果有人能告诉我该怎么称呼它,以及如何在R中计算和绘制这个最简单的东西 - 所以没有循环和东西。我的直觉告诉我至少有5个包和10个函数来解决这个问题。也许类似于带有固定分位数的summary()。

因此,我们假设上表可用作数据框:

salaries <- data.frame(person = c(1,2,3,...), salary = c(30000,...))

1 个答案:

答案 0 :(得分:1)

使用SLID - 包中的car收入数据集:

library(car)

dat <- SLID[!is.na(SLID$wage),]       # Remove missing values
dat$income <- dat$wage*40*50          # "translate" the wages to their full time annual earnings equivalent.
dat$id <- seq(1,nrow(dat))       

# Create a data.frame with a person ID and their annual income:
keep <- data.frame(id = seq(1, nrow(dat)), 
                   income = dat$income)
keep <- keep[order(keep$income, decreasing = TRUE),]  # Descending ordering according to income
keep$accum <- cumsum(keep$income)                     # Cumulative sum of the descending incomes
keep$pct <- keep$accum/sum(keep$income)*100           # % of the total income

keep$check <- keep$pct<80                      # Check where the % is smaller than 80%
threshold <- min(which(keep$check == FALSE))   # First line where % is larger than 80%
border <- threshold/nrow(keep)*100             # Check which percentile that was
border <- round(border, digits = 2)
paste0(border, "% of the people earn 80% of the income")

#[1] "62.41% of the people earn 80% of the income"

正如我们所期望的那样,经典的80-20规则将显示&#34; 20%的人获得80%的收入&#34;。此规则不适用于此处,您可以看到..

颠倒的论点:

# The 20% of the people earn X % of total income:

linenr <- ceiling(1/5*nrow(keep))
outcome2 <- round(keep$pct[linenr], digits = 2)
paste0(outcome2, "% of total income is earned by the top 20% of the people")

# [1] "36.07% of total income is earned by the top 20% of the people"

请注意,此处显示的数字不代表现实世界:)

此外,Wikipedia还有关于帕累托原则的更多信息,也称为80-20规则。似乎这个规则出现在多种设置中,例如商业,经济和数学。