如何根据2个其他变量的因子水平对同一个变量进行计算?

时间:2013-11-30 22:02:07

标签: r

我在数据框中有以下信息

  Gender  EducationLevel   Income(mean) 
   Male     Low             10
   Male     High            12
   Female   Low

我想创建一个单独的变量,其中我将减去属于高教育水平和低教育水平的男性之间的平均收入(然后对女性做同样的事情)。 如果没有手动操作R代码怎么可能?

我很困惑,因为有两个if子句:如果Gender是Male,然后减去 收入(平均值)(EducationLevel =“High”) - (EducationLevel =“Low”),(Gender ==“Male”)

新变量看起来像这样(不再有EducationLevel信息):

   Gender  Difference
    Male      2
    Female    3

任何帮助都会非常感激,我想过使用lapply,但我在R中没有足够的经验才能获得成功 我不确定如何在(EducationLevel =“High”) - (EducationLevel =“Low”)计算中设置变量Income(mean)。

2 个答案:

答案 0 :(得分:2)

考虑到您的原始数据的排序方式,您可以使用aggregatediff

df <- read.table(text = "Gender  EducationLevel   Income(mean) 
Male     Low             10
Male     High            12
Female   Low 7
Female High 10", header = TRUE)

df   

请注意,“收入(平均值)”不是语法上有效的变量名称,而是由read.table转换。请参阅check.names中的?read.table参数。

setNames(aggregate(Income.mean. ~ Gender, data = df, diff), c("Gender", "Difference"))

#   Gender  Difference
# 1 Female           3
# 2   Male           2

答案 1 :(得分:0)

我使用简化的符号,但基本上就是你需要做的事情:

> df <- data.frame(g = c("m","m","f","f"), e = c("h","l","h","l"), i = sample(4,4))

> df
  g e i
1 m h 1
2 m l 4
3 f h 2
4 f l 3

> mean(df[df$g == "m" & df$e == "h","i"]) - mean(df[df$g == "m" & df$e == "l","i"])
[1] -3

> mean(df[df$g == "f" & df$e == "h","i"]) - mean(df[df$g == "f" & df$e == "l","i"])
[1] -1