删除R中第一天数据帧的每一行

时间:2018-06-22 14:52:40

标签: r date split

                   Date     Prix d  
320 2007-01-03 23:45:00 110.2807 5
321 2007-01-03 23:50:00 110.2291 5
322 2007-01-03 23:55:00 110.2420 5
323 2007-01-04 00:00:00 110.3323 5
324 2007-01-04 00:05:00 110.3323 5

我的数据框是这样排序的,如何删除新的每一天? 在示例中,323行表示感谢,

3 个答案:

答案 0 :(得分:1)

使用dplyr的解决方案:

library(dplyr)

df %>%
  group_by(ymd = as.Date(Date)) %>%
  slice(-1) %>%
  ungroup() %>%
  select(-ymd)

结果:

# A tibble: 3 x 2
  Date                Prix.d    
  <fct>               <fct>     
1 2007-01-03 23:50:00 110.2291 5
2 2007-01-03 23:55:00 110.2420 5
3 2007-01-04 00:05:00 110.3323 5

数据:

df = structure(list(Date = structure(1:5, .Label = c("2007-01-03 23:45:00", 
"2007-01-03 23:50:00", "2007-01-03 23:55:00", "2007-01-04 00:00:00", 
"2007-01-04 00:05:00"), class = "factor"), Prix.d = structure(c(3L, 
1L, 2L, 4L, 4L), .Label = c("110.2291 5", "110.2420 5", "110.2807 5", 
"110.3323 5"), class = "factor")), .Names = c("Date", "Prix.d"
), class = "data.frame", row.names = 320:324)

答案 1 :(得分:1)

基本的R解决方案:

do.call(rbind,by(df,as.Date(df$Date),function(x) x[-1,]))

#                               Date     Prix.d
# 2007-01-03.321 2007-01-03 23:50:00 110.2291 5
# 2007-01-03.322 2007-01-03 23:55:00 110.2420 5
# 2007-01-04     2007-01-04 00:05:00 110.3323 5

答案 2 :(得分:0)

这样的事情怎么样?

library(tidyverse);
df %>%
    rownames_to_column("row") %>%
    mutate(
        Date = as.POSIXct(Date),
        dmy = format(Date, "%d-%m-%Y")) %>%
    group_by(dmy) %>%
    mutate(n = 1:n()) %>%
    filter(n > 1) %>%
    ungroup() %>%
    select(-dmy, -n)
## A tibble: 3 x 4
#  row   Date                 Prix     d
#  <chr> <dttm>              <dbl> <int>
#1 321   2007-01-03 23:50:00  110.     5
#2 322   2007-01-03 23:55:00  110.     5
#3 324   2007-01-04 00:05:00  110.     5

要删除列row,只需删除行rownames_to_column("row") %>%;我仅添加了一个明确的row列用于演示和透明度。

我意识到这与您的预期输出并不完全相同,因为这里row=320也将被删除(因为这是当天的首次观察)。


样本数据

df <- read.table(text =
    "                   Date     Prix d
320 '2007-01-03 23:45:00' 110.2807 5
321 '2007-01-03 23:50:00' 110.2291 5
322 '2007-01-03 23:55:00' 110.2420 5
323 '2007-01-04 00:00:00' 110.3323 5
324 '2007-01-04 00:05:00' 110.3323 5", header = T, row.names = 1)