如何过滤R中关于一列的列?

时间:2017-06-22 12:33:52

标签: r filter

我在R中有这个数据集:

image

我想在每次发生时过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应该消除它,我只想删除此列中每次重复此变量中的值。输出应如下所示:

2015-01-0101:00:00 sky is clear 1420070400
2015-01-0102:00:00 scattered clouds 1420074000
2015-01-0104:00:00 sky is clear 1420081200

在R中有没有简单的方法呢?

3 个答案:

答案 0 :(得分:0)

这是一个基于聚合的基础解决方案:

aggregate(Time ~ .,df,head,1)

答案 1 :(得分:0)

每个@CodeMonkey使用dplyr

df %>%
mutate(grouper = cumsum(weather_description == lag(weather_description, default = first(weather_description)))) %>%
group_by(grouper) %>%
summarise(Time = first(time),
          weather_description = first(weather_description),
          timestamps = first(timestamps))

答案 2 :(得分:0)

如果基础r中的此解决方案适合您,请告诉我:

数据

df <- data.frame(Time = c(as.Date(16436),as.Date(16437),as.Date(16437),as.Date(16437),
                          as.Date(16437),as.Date(16438),as.Date(16438),as.Date(16438),
                          as.Date(16438),as.Date(16439),as.Date(16439),as.Date(16439)), 
                 weather_description = c("sky is clear",
                                         "scattered clouds")[c(1,2,2,2,2,2,2,2,2,1,1,1)])
df
#         Time weather_description
#1  2015-01-01        sky is clear
#2  2015-01-02    scattered clouds
#3  2015-01-02    scattered clouds
#4  2015-01-02    scattered clouds
#5  2015-01-02    scattered clouds
#6  2015-01-03    scattered clouds
#7  2015-01-03    scattered clouds
#8  2015-01-03    scattered clouds
#9  2015-01-03    scattered clouds
#10 2015-01-04        sky is clear
#11 2015-01-04        sky is clear
#12 2015-01-04        sky is clear

功能

weather_changes <- function(dat){
  # split by weather description
  splitted <- split(dat, dat[,2])
  # for each, return only the first dates of a sequence
  byweather <- lapply(splitted, function(x) x[-which(c(0,ifelse(diff(x[,1])<2,1,0))==1),])
  # combine to a single data.frame
  newdf <- do.call(rbind, byweather)
  # order by date
  newdf <- newdf[order(newdf[,1]),]
  # remove the messy row names
  rownames(newdf) <- NULL
  newdf
}
weather_changes(df)
#        Time weather_description
#1 2015-01-01        sky is clear
#2 2015-01-02    scattered clouds
#3 2015-01-04        sky is clear