r删除不符合子集标准的行

时间:2015-10-05 11:20:07

标签: r grouping subset

我已经对我的df进行了攻击,为渔业区域和不同的渔具和物种制作了一系列着陆(重量)时间序列。 我想删除每个捕鱼区域的所有行+渔具+物种。代码组合,其中时间序列的平均着陆重量小于10吨。

这是我的代码示例(每个组合的年份范围并不总是相同);

    Year   Species.Code gear        region  Landings.t
    1988    COD         creel       Greece  1
    1992    COD         creel       Greece  2
    1994    COD         creel       Greece  1
    1996    COD         creel       Greece  2
    2001    COD         creel       Greece  1
    2002    COD         creel       Greece  1
    2003    COD         creel       Greece  1
    1984    LOB         creel       Cyprus  24
    1985    LOB         creel       Cyprus  18
    1986    LOB         creel       Cyprus  21
    1987    LOB         creel       Cyprus  10
    1988    LOB         creel       Cyprus  38
    1989    LOB         creel       Cyprus  35
    1990    LOB         creel       Cyprus  29
    1991    LOB         creel       Cyprus  8
    1992    LOB         creel       Cyprus  6
    1993    LOB         creel       Cyprus  2
    1994    LOB         creel       Cyprus  1
    1995    LOB         creel       Cyprus  1
    1960    HAK         demersal    Malta   13
    1961    HAK         demersal    Malta   42
    1962    HAK         demersal    Malta   82
    1963    HAK         demersal    Malta   35
    1964    HAK         demersal    Malta   18
    1965    HAK         demersal    Malta   49
    1966    HAK         demersal    Malta   76
    1967    HAK         demersal    Malta   67
    1968    HAK         demersal    Malta   57
    1969    HAK         demersal    Malta   48
    1970    HAK         demersal    Malta   40
    1982    QSC         demersal    Gozo    3
    1983    QSC         demersal    Gozo    1
    1984    QSC         demersal    Gozo    1
    1985    QSC         demersal    Gozo    1
    1986    QSC         demersal    Gozo    1
    1987    QSC         demersal    Gozo    1
    1988    QSC         demersal    Gozo    4
    1989    QSC         demersal    Gozo    4
    1990    QSC         demersal    Gozo    1
    1991    QSC         demersal    Gozo    1
    1992    QSC         demersal    Gozo    2
    1993    QSC         demersal    Gozo    3
    1994    QSC         demersal    Gozo    2
    1995    QSC         demersal    Gozo    1

所以从这部分数据中我想删除希腊+筒子架+ COD组合以及Gozo +底层+ QSC组合的所有行。

我想要的输出是;

        Year   Species.Code gear        region  Landings.t
        1984    LOB         creel       Cyprus  24
        1985    LOB         creel       Cyprus  18
        1986    LOB         creel       Cyprus  21
        1987    LOB         creel       Cyprus  10
        1988    LOB         creel       Cyprus  38
        1989    LOB         creel       Cyprus  35
        1990    LOB         creel       Cyprus  29
        1991    LOB         creel       Cyprus  8
        1992    LOB         creel       Cyprus  6
        1993    LOB         creel       Cyprus  2
        1994    LOB         creel       Cyprus  1
        1995    LOB         creel       Cyprus  1
        1960    HAK         demersal    Malta   13
        1961    HAK         demersal    Malta   42
        1962    HAK         demersal    Malta   82
        1963    HAK         demersal    Malta   35
        1964    HAK         demersal    Malta   18
        1965    HAK         demersal    Malta   49
        1966    HAK         demersal    Malta   76
        1967    HAK         demersal    Malta   67
        1968    HAK         demersal    Malta   57
        1969    HAK         demersal    Malta   48
        1970    HAK         demersal    Malta   40

1 个答案:

答案 0 :(得分:5)

您可以尝试这样做,因为您的数据集没有分组特征:

subset(dat, Landings.t > 10)
#an alternative option
dat[dat$Landings.t > 10,]

编辑:

鉴于OP提供的新信息,我相信这将是您正在寻找的:

#load the library data.table. If you don't have this, uncomment the next line:
#install.packages('data.table')
library(data.table)
#set 'dat' (or the name of your object, into a data.table for preparation
setDT(dat)
#introduce the 'key' or grouped variables:
setkey(dat, Species.Code, gear, region)
#subset accordingly to the key
dat[,subset(.SD, mean(Landings.t) > 10), by =key(dat)]
#ordered by year:
dat[,subset(.SD, mean(Landings.t) > 10), by =key(dat)][order(Year)]
#what you should get:
#        Species.Code     gear region Year Landings.t
# 1:          HAK demersal  Malta 1960         13
# 2:          HAK demersal  Malta 1961         42
# 3:          HAK demersal  Malta 1962         82
# 4:          HAK demersal  Malta 1963         35
# 5:          HAK demersal  Malta 1964         18
# 6:          HAK demersal  Malta 1965         49
# 7:          HAK demersal  Malta 1966         76
# 8:          HAK demersal  Malta 1967         67
# 9:          HAK demersal  Malta 1968         57
#10:          HAK demersal  Malta 1969         48
#11:          HAK demersal  Malta 1970         40
#12:          LOB    creel Cyprus 1984         24
#13:          LOB    creel Cyprus 1985         18
#14:          LOB    creel Cyprus 1986         21
#15:          LOB    creel Cyprus 1987         10
#16:          LOB    creel Cyprus 1988         38
#17:          LOB    creel Cyprus 1989         35
#18:          LOB    creel Cyprus 1990         29
#19:          LOB    creel Cyprus 1991          8
#20:          LOB    creel Cyprus 1992          6
#21:          LOB    creel Cyprus 1993          2
#22:          LOB    creel Cyprus 1994          1
#23:          LOB    creel Cyprus 1995          1
#    Species.Code     gear region Year Landings.t