我在R中有这个数据集:
我想在每次发生时过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应该消除它,我只想删除此列中每次重复此变量中的值。输出应如下所示:
2015-01-0101:00:00 sky is clear 1420070400
2015-01-0102:00:00 scattered clouds 1420074000
2015-01-0104:00:00 sky is clear 1420081200
在R中有没有简单的方法呢?
答案 0 :(得分:0)
这是一个基于聚合的基础解决方案:
aggregate(Time ~ .,df,head,1)
答案 1 :(得分:0)
每个@CodeMonkey使用dplyr
:
df %>%
mutate(grouper = cumsum(weather_description == lag(weather_description, default = first(weather_description)))) %>%
group_by(grouper) %>%
summarise(Time = first(time),
weather_description = first(weather_description),
timestamps = first(timestamps))
答案 2 :(得分:0)
如果基础r中的此解决方案适合您,请告诉我:
数据
df <- data.frame(Time = c(as.Date(16436),as.Date(16437),as.Date(16437),as.Date(16437),
as.Date(16437),as.Date(16438),as.Date(16438),as.Date(16438),
as.Date(16438),as.Date(16439),as.Date(16439),as.Date(16439)),
weather_description = c("sky is clear",
"scattered clouds")[c(1,2,2,2,2,2,2,2,2,1,1,1)])
df
# Time weather_description
#1 2015-01-01 sky is clear
#2 2015-01-02 scattered clouds
#3 2015-01-02 scattered clouds
#4 2015-01-02 scattered clouds
#5 2015-01-02 scattered clouds
#6 2015-01-03 scattered clouds
#7 2015-01-03 scattered clouds
#8 2015-01-03 scattered clouds
#9 2015-01-03 scattered clouds
#10 2015-01-04 sky is clear
#11 2015-01-04 sky is clear
#12 2015-01-04 sky is clear
功能
weather_changes <- function(dat){
# split by weather description
splitted <- split(dat, dat[,2])
# for each, return only the first dates of a sequence
byweather <- lapply(splitted, function(x) x[-which(c(0,ifelse(diff(x[,1])<2,1,0))==1),])
# combine to a single data.frame
newdf <- do.call(rbind, byweather)
# order by date
newdf <- newdf[order(newdf[,1]),]
# remove the messy row names
rownames(newdf) <- NULL
newdf
}
weather_changes(df)
# Time weather_description
#1 2015-01-01 sky is clear
#2 2015-01-02 scattered clouds
#3 2015-01-04 sky is clear