如何将带有德语字符串日期的列转换为r中的日期变量?

时间:2018-07-27 12:22:33

标签: r date

我有一个带有德国日期的变量,并希望将其转换为日期变量,以便以后过滤出年度季度。

像这样:

      Date         newDate quarter
1  21. Mrz 10       <NA>    <NA>
2  21. Jan 10 2010-01-21 2010 Q1
3  30. Mrz 10       <NA>    <NA>
4  21. Mrz 10       <NA>    <NA>
5  21. Jan 10 2010-01-21 2010 Q1
不幸的是,R无法识别3月的德国月份缩写,例如“ Mrz”。 我已经尝试将语言更改为德语,但这没有帮助。

Sys.setlocale(category = "LC_TIME", locale="de_DE.UTF-8")
[1] "de_DE.UTF-8"
alldata_LOR_BZ$newErstesAngebot = as.Date(as.character(alldata_LOR_BZ$newErstesAngebot), "%d. %b %y")
library(zoo)
Dateproblem$quarter <- as.yearqtr(Dateproblem$newDate)

SeesionInfo

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks    /vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/de_DE.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_0.2.0   tidyr_0.8.1    zoo_1.8-3      foreign_0.8-71 car_3.0-0     
 [6] carData_3.0-1  gplots_3.0.1   plm_1.6-6      Formula_1.2-3  dplyr_0.7.6   
[11] ggplot2_3.0.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18       bdsmatrix_1.3-3    lattice_0.20-35    gtools_3.8.1      
 [5] assertthat_0.2.0   rprojroot_1.3-2    digest_0.6.15      lmtest_0.9-36     
 [9] R6_2.2.2           cellranger_1.1.0   plyr_1.8.4         backports_1.1.2   
[13] evaluate_0.11      pillar_1.3.0       miscTools_0.6-22   rlang_0.2.1       
[17] lazyeval_0.2.1     curl_3.2           readxl_1.1.0       data.table_1.11.4 
[21] gdata_2.18.0       whisker_0.3-2      callr_2.0.4        rmarkdown_1.10    
[25] stringr_1.3.1      munsell_0.5.0      compiler_3.5.1     pkgconfig_2.0.1   
[29] clipr_0.4.1        maxLik_1.3-4       htmltools_0.3.6    tidyselect_0.2.4  
[33] tibble_1.4.2       rio_0.5.10         crayon_1.3.4       withr_2.1.2       
[37] MASS_7.3-50        bitops_1.0-6       grid_3.5.1         nlme_3.1-137      
[41] gtable_0.2.0       magrittr_1.5       scales_0.5.0           KernSmooth_2.23-15
[45] zip_1.0.0          stringi_1.2.4      bindrcpp_0.2.2     sandwich_2.4-0    
[49] openxlsx_4.1.0     tools_3.5.1        forcats_0.3.0      glue_1.3.0        
[53] purrr_0.2.5        hms_0.4.2          processx_3.1.0     abind_1.4-5       
[57] yaml_2.2.0         colorspace_1.3-2   caTools_1.17.1.1   knitr_1.20        
[61] bindr_0.1.1        haven_1.1.2       

现在我注意到语言更改似乎没有效果... 是Sys.setlocale(类别=“ LC_TIME”,locale =“ de_DE.UTF-8”) 不是正确的命令?

2 个答案:

答案 0 :(得分:1)

根据官方的DIN 1355德语3字母月的缩写,编写自己的矢量化解析器并不难。

# 3-letter months abbreviations DIN 1355
months <- c(
    "Jan", "Feb", "Mrz", "Apr",
    "Mai", "Jun", "Jul", "Aug",
    "Sep", "Okt", "Nov", "Dez")

# Custom function to parse German dates DD. MMM YY
parse.de.date <- function(x) {
    as.Date(
        sapply(x, function(t) {
            dmy <- unlist(strsplit(gsub("\\.", "", t), "\\s"))
            paste(dmy[1], match(dmy[2], months), dmy[3], sep = "-")
        }),
        format = "%d-%m-%y")
}

library(dplyr)
df %>%
    mutate(Date = parse.de.date(Date))
#    Date    newDate quarter
#1 2010-03-21       <NA>    <NA>
#2 2010-01-21 2010-01-21 2010 Q1
#3 2010-03-30       <NA>    <NA>
#4 2010-03-21       <NA>    <NA>
#5 2010-01-21 2010-01-21 2010 Q1

样本数据

df <- read.table(text =
    "      Date         newDate quarter
1  '21. Mrz 10'       <NA>    <NA>
2  '21. Jan 10' 2010-01-21 '2010 Q1'
3  '30. Mrz 10'       <NA>    <NA>
4  '21. Mrz 10'       <NA>    <NA>
5  '21. Jan 10' 2010-01-21 '2010 Q1'", header = T)

答案 1 :(得分:0)

无需编写任何内容,readr软件包可以为您完成所有工作,只需定义缩写的月份名称即可:

# example data:
dates <- c("21. Mrz 10",
"21. Jan 10",
"30. Mrz 10",
"21. Mrz 10",
"21. Jan 10")

# load library
library(readr)

# get the default german locale
my_format <- date_names_lang("de")

# change the abbrevated month names
my_format$mon_ab <- c("Jan", "Feb", "Mrz", "Apr", "Mai", "Jun", "Jul", "Aug", "Sep", "Okt", "Nov", "Dez")

# parse using your format
parse_date(dates, format="%d. %b %y", locale=locale(date_names = my_format))