Question

好的，所以我真的被卡住了。我有一个如下所示的数据集：

                  Species Latitude Longitude            Oiling Condition BirdCount      Date_ Oil_Cond       Date week.number
1         Northern Gannet 30.32860 -89.19810 Not Visibly Oiled      Live         1 2010-07-21        1 2010-07-21          30
2           Laughing Gull 30.23172 -88.32127 Not Visibly Oiled      Live         1 2010-05-05        1 2010-05-05          19
3         Northern Gannet 30.26677 -87.59248     Visibly Oiled      Live         1 2010-05-05        2 2010-05-05          19
4  American White Pelican 29.29649 -89.66432 Not Visibly Oiled      Live         1 2010-05-05        1 2010-05-05          19
5           Brown Pelican 29.88244 -88.87624     Visibly Oiled      Live         1 2010-05-08        2 2010-05-08          19
6           Brown Pelican 29.00290 -89.36961 Not Visibly Oiled      Live         1 2010-05-14        1 2010-05-14          20
7         Northern Gannet 30.33390 -85.56565           Unknown      Live         1 2010-05-17        6 2010-05-17          21
8             Common Loon 30.28177 -87.51028 Not Visibly Oiled      Live         1 2010-05-17        1 2010-05-17          21
9           Brown Pelican 30.41410 -88.24542     Visibly Oiled      Live         1 2010-05-18        2 2010-05-18          21
10        Northern Gannet 30.24063 -88.12451 Not Visibly Oiled      Live         1 2010-05-18        1 2010-05-18          21

我正试图获得一个多面直方图，绘制变量Oil_Cond，用于5种最常见的鸟类（有超过100种独特的鸟类）。

起初我想制作一个包含所有物种的方面，并使用以下代码：

qplot(Oil_Cond, data = birds, facets = Species ~., geom = "histogram")

但是，当然，那超载并且不会起作用，因为会有超过100个方面。所以我决定我真的只关心前5种，我弄清楚它们是什么以及它们出现的频率（Laughing Gull：3036，Brown Pelican：789，Northern Gannet：546，Royal Tern：321， Black Skimmer：258）。但是，我不知道该怎么做。

非常感谢任何帮助。

谢谢：）

艾米

Answer 1

这里最简单的方法可能是简单地绘制数据的子集。唯一可能需要注意的是物种变量是否存储为因子，而不是字符串。首先创建一个子集：

birdsSub <- subset(birds, Species %in% c('Laughing Gull','Brown Pelican',
                     'Northern Gannet','Royal Tern','Black Skimmer'))
birdsSub$Species <- droplevels(birdsSub$Species)

然后您应该能够像以前一样将此数据框传递给qplot。 droplevels的原因在于，如果将该变量存储为一个因子，那么不再出现的所有物种将作为未使用的因子水平“出现”，并且您将最终得到所有100个面板，除了五个以外都是空的。

Answer 2

你可以使用优秀的plyr包...

解决这个问题

# If you don't already have plyr installed, uncomment the next line:
# install.packages('plyr')
require(plyr)

# First, find out how many of each species you have...

ns=ddply(birds,.(Species),summarise,n=length(Species))

# This will produce a table listing the number of each species you have 
# (in the column 'n'). Type 'ns' to see the table.
# We can then rank the species occurrence, to see how important the different 
# species are

ns$r = rank(-ns$n) # negative because 'rank' starts with the lowest number.

# have a look at the top 5 species:

subset(ns,r<=5)

# There are a couple of ways to proceed from here.  Either we could get the 
# top 5 species names from this 'ns' table:
# names=as.character(subset(ns,r>=5)$Species) 
# and use joran's method, or we could merge the ns table and the original 
# dataset (so that each species has an 'n' and 'r' attribute) and subset the 
# data by species number or rank.  I prefer the latter, as it allows you to 
# flexibly change the species number threshold. i.e.:

birds=merge(birds,ns,by='Species')

# We've now added 'n' and 'r' columns to the birds data, so we can select 
# our subset based on either of these columns:

birds.by.r=subset(birds,r<=5) # selects only the top 5 bird species
birds.by.n=subset(birds,r>=100) # selects all species with over 100 occurrences

# Then just plot away!

qplot(Oil_Cond,data=birds.by.r,facets=Species~.,geom='histogram')

# or

qplot(Oil_Cond,data=birds.by.n,facets=Species~.,geom='histogram')

在ggplot2中使用特定值进行分面

2 个答案: