我在数据框中有以下信息
Gender EducationLevel Income(mean)
Male Low 10
Male High 12
Female Low
我想创建一个单独的变量,其中我将减去属于高教育水平和低教育水平的男性之间的平均收入(然后对女性做同样的事情)。 如果没有手动操作R代码怎么可能?
我很困惑,因为有两个if子句:如果Gender是Male,然后减去 收入(平均值)(EducationLevel =“High”) - (EducationLevel =“Low”),(Gender ==“Male”)
新变量看起来像这样(不再有EducationLevel信息):
Gender Difference
Male 2
Female 3
任何帮助都会非常感激,我想过使用lapply,但我在R中没有足够的经验才能获得成功 我不确定如何在(EducationLevel =“High”) - (EducationLevel =“Low”)计算中设置变量Income(mean)。
答案 0 :(得分:2)
考虑到您的原始数据的排序方式,您可以使用aggregate
和diff
。
df <- read.table(text = "Gender EducationLevel Income(mean)
Male Low 10
Male High 12
Female Low 7
Female High 10", header = TRUE)
df
请注意,“收入(平均值)”不是语法上有效的变量名称,而是由read.table
转换。请参阅check.names
中的?read.table
参数。
setNames(aggregate(Income.mean. ~ Gender, data = df, diff), c("Gender", "Difference"))
# Gender Difference
# 1 Female 3
# 2 Male 2
答案 1 :(得分:0)
我使用简化的符号,但基本上就是你需要做的事情:
> df <- data.frame(g = c("m","m","f","f"), e = c("h","l","h","l"), i = sample(4,4))
> df
g e i
1 m h 1
2 m l 4
3 f h 2
4 f l 3
> mean(df[df$g == "m" & df$e == "h","i"]) - mean(df[df$g == "m" & df$e == "l","i"])
[1] -3
> mean(df[df$g == "f" & df$e == "h","i"]) - mean(df[df$g == "f" & df$e == "l","i"])
[1] -1