在这里,我创建一个新列,以指示myData是高于还是低于其中位数
### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
#create column showing median split
myBreaks= quantile(myDataFrame$myData,c(0,.5,1))
myDataFrame$MedianSplitWholeData = cut(
myDataFrame$myData,
breaks=myBreaks,
include.lowest=TRUE,
labels=c("Below","Above"))
#Check if it's correct
myDataFrame$AboveWholeMedian = myDataFrame$myData > median(myDataFrame$myData)
myDataFrame
工作正常。现在我想做同样的事情,但计算myFactor每个级别的中位数分割。
我想出了这个:
#Median splits within factor levels
byOutput=by(myDataFrame$myData,myDataFrame$myFactor, function (x) {
myBreaks= quantile(x,c(0,.5,1))
MedianSplitByGroup=cut(x,
breaks=myBreaks,
include.lowest=TRUE,
labels=c("Below","Above"))
MedianSplitByGroup
})
byOutput包含我想要的内容。它正确地对因子A,B和C的每个元素进行分类。但是我想创建一个新列myDataFrame $ FactorLevelMedianSplit,它显示了新计算的中值分割。
如何将“by”命令的输出转换为有用的数据框列?
我想也许“by”命令不是R式的做法......
更新:
以Thierry为例巧妙地使用factor(),并在Spector的书中发现“ave”函数后,我找到了这个解决方案,不需要额外的包。
myDataFrame$MediansByFactor=ave(
myDataFrame$myData,
myDataFrame$myFactor,
FUN=median)
myDataFrame$FactorLevelMedianSplit = factor(
myDataFrame$myData>myDataFrame$MediansByFactor,
levels = c(TRUE, FALSE),
labels = c("Above", "Below"))
答案 0 :(得分:3)
这是使用plyr包的解决方案。
myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
library(plyr)
ddply(myDataFrame, "myFactor", function(x){
x$Median <- median(x$myData)
x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
x
})
答案 1 :(得分:1)
这是一种黑客行为方式。哈德利可能会有更优雅的东西:
首先,我们简单地连接by
输出:
R> do.call(c,byOutput)
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5
1 2 2 1 1 1 1 2 1 2 1 2 1 1 2
并且重要的是我们在这里获得因子级别1和2,我们可以使用这些因子来重新索引具有这些级别的新因子:
R> c("Below","Above")[do.call(c,byOutput)]
[1] "Below" "Above" "Above" "Below" "Below" "Below" "Below" "Above"
[8] "Below" "Above" "Below" "Above" "Below" "Below" "Above"
R> as.factor(c("Below","Above")[do.call(c,byOutput)])
[1] Below Above Above Below Below Below Below Above Below Above
[11] Below Above Below Below Above
Levels: Above Below
然后我们可以将其分配到您想要修改的data.frame
中:
R> myDataFrame$FactorLevelMedianSplit <-
as.factor(c("Below","Above")[do.call(c,byOutput)])
更新:没关系,我们需要重新索引myDataFrame,以便在添加新列之前对A A ... A B ... B C ... C进行排序。留下来作为练习...
答案 2 :(得分:0)
您不是想要这样的东西吗?
Course$grade2 <- ifelse(Course$grade >= median(Course$grade), 1, 0)