如何为每个分类变量找到连续变量的均值

时间:2013-04-20 16:12:57

标签: r

我想在y轴上绘制连续的BMI,在x轴上绘制家庭收入的分类变量,我想用图表绘制每个类别的平均BMI。但是,我不确定如何找到每个家庭收入因素的平均BMI。

Dataset nh  (5994 total IDs with Observations) (Parts of the 2009-2010 NHANES Dataset)
> dput(head(nh))
structure(list(SeqN = c(51624L, 51628L, 51629L, 51630L, 51633L, 51635L), 
Gender = c(1L, 2L, 1L, 2L, 1L, 1L), Age = c(34L, 60L, 26L, 49L, 80L, 80L), 
Ethnicity = c(3L, 4L, 1L, 3L, 3L, 3L), FamSize = c(4L, 2L, 5L, 3L, 2L, 1L),
RatioIncomePoverty = c(1.36, 0.69, 1.01, 1.91, 1.27, 1.69), 
MECWgt2 = c(81528.77201, 21000.33872, 22633.58187, 74112.48684, 12381.11532, 22502.50666),
BMI = c(32.22, 42.39, 32.61, 30.57, 26.04, 27.62), 
LengthUS = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,NA_integer_),
Education = c(3L, 3L, 2L, 4L, 4L, 2L), LocationBorn = c(1L, 1L, 1L, 1L, 1L, 1L), 
FamIncome = c(6L, 3L, 6L, 7L, 4L, 4L)), .Names = c("SeqN", 
"Gender", "Age", "Ethnicity", "FamSize", "RatioIncomePoverty", 
"MECWgt2", "BMI", "LengthUS", "Education", "LocationBorn", "FamIncome"), 
row.names = c(NA, 6L), class = "data.frame")

faminc <- as.character(nhanes$FamIncome)
faminc

任何有关如何建模数据以实现此目标的建议都将受到赞赏。

2 个答案:

答案 0 :(得分:2)

这可能有效:

    library(plyr)
    nhh<-ddply(nh,.(famIncome), summarise, mean.bmi=mean(bmi)) # find mean bmi
    with(nhh, plot(famIncome,mean.bmi)) # simple plot

答案 1 :(得分:2)

以下是使用aggregate的基本解决方案:

a <- aggregate(BMI ~ FamIncome, data=nh, FUN=mean)
barplot(a$BMI, names.arg=a$FamIncome)

enter image description here