从每个单元格中具有多个日期的列中提取最新日期

时间:2018-10-25 14:11:50

标签: r

我有以下虚拟数据帧:

structure(list(id = 1:10, dates = c("2018-07-02, 2018-06-28", 
"2018-08-22", "2018-08-06, 2018-07-31", "2018-03-08", "2018-02-22, 2018-02-19", 
"2018-07-04, 2018-07-06", "2018-06-26, 2018-06-22", "2018-01-18, 2018-01-24", 
"2018-06-05, 2018-06-14", "2018-01-18")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

我想将“日期”列中的所有条目转换为日期,然后选择最新的条目,并删除该单元格中的所有其他日期。

我尝试了以下方法:

library(dplyr)
library(reprex)
library(purrr)
library(lubridate)

test_df %>%
    mutate(dates = dates %>%
            str_extract_all("[0-9]+-[0-9]+-[0-9]+") %>%
            map(ymd) %>%
            map_lgl(~ any(max(.))))

但是以某种方式,这会将每个单元格中的所有条目转换为数字,而不是正确的日期。

我最后想要得到的东西:

id dates
1 2018-07-02
2 2018-08-22            
3 2018-08-06
4 2018-03-08            
5 2018-02-22
6 2018-07-06
7 2018-06-26
8 2018-01-24
9 2018-06-14
10 2018-01-18

3 个答案:

答案 0 :(得分:2)

scan在字段中,取最大并转换为Date类。

library(dplyr)

scan_max <- function(x) {
  max(scan(text = x, what = "", sep = ",", quiet = TRUE, strip.white = TRUE))
}
test_df %>%
  mutate(dates = as.Date(sapply(dates, scan_max)))

给予:

# A tibble: 10 x 2
      id dates     
   <int> <date>    
 1     1 2018-07-02
 2     2 2018-08-22
 3     3 2018-08-06
 4     4 2018-03-08
 5     5 2018-02-22
 6     6 2018-07-06
 7     7 2018-06-26
 8     8 2018-01-24
 9     9 2018-06-14
10    10 2018-01-18

它也可以这样写:

scan_max <- . %>% 
  scan(text = ., what = "", sep = ",", quiet = TRUE, strip.white = TRUE) %>%
  max

test_df %>%
  mutate(dates = dates %>% sapply(scan_max) %>% as.Date)

答案 1 :(得分:1)

您可以尝试:

filename = f'{self.FILEPATH}{box_code}_{datetime.now().strftime("%d-%m-%Y")}.txt'

 with open(filename, 'a') as out:
      out.write('text_text' + '\n')
      out.close()

答案 2 :(得分:1)

我使用三个突变:

  1. 用逗号分隔字符串
  2. 将字符串转换为日期
  3. 仅保留最新日期

然后就是这个了

df <- structure(list(id = 1:10, dates = c("2018-07-02, 2018-06-28", 
                                    "2018-08-22", "2018-08-06, 2018-07-31", "2018-03-08", "2018-02-22, 2018-02-19", 
                                    "2018-07-04, 2018-07-06", "2018-06-26, 2018-06-22", "2018-01-18, 2018-01-24", 
                                    "2018-06-05, 2018-06-14", "2018-01-18")), class = c("tbl_df", 
                                                                                        "tbl", "data.frame"), row.names = c(NA, -10L))

library(tidyr)
library(stringi)
library(dplyr)

df_new <- df %>% 
  mutate(dates = stri_split_fixed(dates, ", ")) %>% 
  mutate(dates = rapply(dates, as.Date, how = "list")) %>% 
  mutate(dates = lapply(dates, function(x) {
    sort(x, decreasing = TRUE)[1]
  })) %>% 
  unnest(dates)

> df_new
# A tibble: 10 x 2
      id dates     
   <int> <date>    
 1     1 2018-07-02
 2     2 2018-08-22
 3     3 2018-08-06
 4     4 2018-03-08
 5     5 2018-02-22
 6     6 2018-07-06
 7     7 2018-06-26
 8     8 2018-01-24
 9     9 2018-06-14
10    10 2018-01-18

另一个带有map的选项,而不是两个apply

library(tidyr)
library(stringi)
library(dplyr)
library(purrr)

df_new <- df %>% 
  mutate(dates = stri_split_fixed(dates, ", ")) %>% 
  mutate(dates = map(dates, function(x) {
    x <- as.Date(x)
    sort(x, decreasing = TRUE)[1]
  })) %>%
  unnest(dates)

df_new