# Generate counts table
library(plyr)
example <- data.frame(count(diamonds,c('color', 'cut')))
example[1:3,]
# Excerpt of table
color cut freq
1 D Fair 163
2 D Good 662
3 D Very Good 1513
您可以轻松过滤freq&gt;表格。 1000:example[example$freq > 1000,]
。我想生成一个类似于此的表,除非所有的值小于一个值,例如1000个行包含在(Other)
行中,类似于当您有太多因素并致电summary(example, maxsum=3)
时所发生的情况。
color cut freq
D : 5 Fair : 7 Min. : 119
E : 5 Good : 7 1st Qu.: 592
(Other):25 (Other):21 Median :1204
Mean :1541
3rd Qu.:2334
Max. :4884
理想输出示例:
理想情况下,我想转换此example[example$color=='J',]
:
color cut freq
J Fair 119
J Good 307
J Very Good 678
J Premium 808
J Ideal 896
并产生这个:
color cut freq
J Very Good 678
J Premium 808
J Ideal 896
J (Other) 426
加成: 如果使用ggplot进行这种过滤可以创建如下图,但通过这种过滤,这也很棒。
ggplot(example, aes(x=color, y=freq)) + geom_bar(aes(fill=cut), stat = "identity")
答案 0 :(得分:3)
以下是使用dplyr
将正确数据直接传输到ggplot
调用的替代方法。
library(dplyr)
example %>% mutate(cut = ifelse(freq < 500, "Other", levels(cut))) %>%
group_by(color, cut) %>%
summarise(freq = sum(freq)) %>%
ggplot(aes(color, freq, fill = cut)) +
geom_bar(stat = "identity")
请务必分离plyr
,否则dplyr
来电的输出将不正确。
答案 1 :(得分:1)
试试这个:
library(plyr)
library(ggplot2)
example <- data.frame(count(diamonds,c('color', 'cut')))
# Compute the row id where frequency is lower than some threshold
idx <- example$freq < 1000
# Create a helper function that adds the level "Other" to a vector
add_other_level <- function(x){
levels(x) <- c(levels(x), "Other")
x
}
# Change the factor leves for the threshold id rows
example <- within(example,
{
color <- add_other_level(color)
color[idx] <- "Other"
cut <- add_other_level(cut)
cut[idx] <- "Other"
}
)
# Create a plot
ggplot(example, aes(x = color, y = freq, fill = cut)) +
geom_bar(stat = "identity")