我有一个非常大的csv文件,我已经导入到R中,需要创建一个数据子集。 csv看起来像这样:
Julian_Day Id Year
52 1 1901
56 5 1901
200 1 1968
等,其中年份为1901-2010
,Id为1-58
,Julian_Day 1-200
为约130,000
行数据。因此,我只希望每个Id的每年最低Julian Day值,并删除所有其他数据行。
答案 0 :(得分:2)
数据强>:
df = data.frame(Year=c(1901,1901,1968,1901),
Id=c(1,5,1,1),
Julian_Day=c(52,56,200,40),
Animal=c('dog','doggy','style','fashion'))
试试这个:
library(data.table)
setDT(df)[ ,min:=min(Julian_Day), by=list(Id, Year)]
#>df
# Year Id Julian_Day Animal min
#1: 1901 1 52 dog 40
#2: 1901 5 56 doggy 56
#3: 1968 1 200 style 200
#4: 1901 1 40 fashion 40
答案 1 :(得分:1)
或简单地用基础R
aggregate(Julian_Day ~., df, min)
# Year Id Julian_Day
# 1 1901 1 40
# 2 1968 1 200
# 3 1901 5 56
或者
library(dplyr)
df %>%
group_by(Id, Year) %>%
summarise(Julian_Day = min(Julian_Day))
# Source: local data frame [3 x 3]
# Groups: Id
#
# Id Year Julian_Day
# 1 1 1901 40
# 2 1 1968 200
# 3 5 1901 56