使用双循环在R中创建子集

时间:2015-02-07 20:15:08

标签: r loops double conditional subset

我有一个非常大的csv文件,我已经导入到R中,需要创建一个数据子集。 csv看起来像这样:

Julian_Day   Id   Year
52            1   1901
56            5   1901
200           1   1968         

等,其中年份为1901-2010,Id为1-58,Julian_Day 1-200为约130,000行数据。因此,我只希望每个Id的每年最低Julian Day值,并删除所有其他数据行。

2 个答案:

答案 0 :(得分:2)

数据

df = data.frame(Year=c(1901,1901,1968,1901),
                Id=c(1,5,1,1),
                Julian_Day=c(52,56,200,40),
                Animal=c('dog','doggy','style','fashion'))

试试这个:

library(data.table)
setDT(df)[ ,min:=min(Julian_Day), by=list(Id, Year)]
#>df
#   Year Id Julian_Day  Animal min
#1: 1901  1         52     dog  40
#2: 1901  5         56   doggy  56
#3: 1968  1        200   style 200
#4: 1901  1         40 fashion  40

答案 1 :(得分:1)

或简单地用基础R

aggregate(Julian_Day ~., df, min)
#   Year Id Julian_Day
# 1 1901  1         40
# 2 1968  1        200
# 3 1901  5         56

或者

library(dplyr)
df %>%
  group_by(Id, Year) %>%
  summarise(Julian_Day = min(Julian_Day))

# Source: local data frame [3 x 3]
# Groups: Id
# 
#   Id Year Julian_Day
# 1  1 1901         40
# 2  1 1968        200
# 3  5 1901         56