Question

我有一列如下；

$ kubectl get po (some magic here)
NAME                               READY   STATUS    RESTARTS   AGE   IP             NODE                                   BOSHID
fluent-bit-4kmzx                   1/1     Running   0          1d    ************   fe2be367-a407-4c15-92e7-b0d8918b7e7b   cd9179dd-731a-4d01-8541-4e86355d4457
fluent-bit-cg26h                   1/1     Running   0          1d    ************   89a7a2dc-7468-4163-90fe-f043e408d6af   fec06254-467a-4bdf-983d-f99b7143a667
fluent-bit-ddqzh                   1/1     Running   0          1d    ************   d4674474-7e0c-49aa-847a-287aa6c1e803   898fff19-3bd5-42d2-8697-0710b0b8baff
sink-controller-57df674b84-mbvcz   1/1     Running   0          1d    ************   89a7a2dc-7468-4163-90fe-f043e408d6af   fec06254-467a-4bdf-983d-f99b7143a667

它们对应于月份，即fiscal_year_end 1 1231 2 1231 3 1231 4 1231 5 202 6 1231 7 1231 8 202 9 1231 10 927，12-31和9-27。

我正在尝试将其设置为这种格式，但似乎无法正确处理。

我已经使用20-2软件包尝试了str_replace_all(df$fiscal_year_end, "(?<=^\\d{2}|^\\d{4})", "-")，但并没有如我所愿。

我在哪里错了？

数据：

stringr

编辑：

structure(list(fiscal_year_end = c(1231L, 1231L, 1231L, 1231L, 
202L, 1231L, 1231L, 202L, 1231L, 927L, 228L, 1231L, 1231L, 1231L, 
1231L, 928L, 1231L, 1231L, 930L, 1231L, 1231L, 628L, 1231L, 1231L, 
1228L, 930L, 1231L, 1231L, 1231L, 1231L, 927L, 630L, 1231L, 202L, 
1231L, 1231L, 1231L, 1231L, 927L, 930L, 1231L, 1231L, 1231L, 
1231L, 228L, 928L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1228L, 1231L, 1231L, 1231L, 1231L, 
131L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 831L, 1231L, 102L, 
1231L, 1231L, 1231L, 1130L, 1231L, 1228L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1031L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 203L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1229L, 1231L, 1231L, 1231L, 426L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 202L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1229L, 1231L, 1231L, 630L, 
1231L, 1231L, 1209L, 1231L, 1231L, 1231L, 728L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 630L, 1231L, 1231L, 1231L, 1231L, 
1231L, 1231L, 727L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 
1231L, 630L, 1231L, 1231L, 1231L, 1130L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 930L, 930L, 1231L, 1231L, 331L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1031L, 1229L, 1231L, 
1231L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 
831L, 630L, 831L)), row.names = c(NA, -200L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")

Answer 1

格式化为4位数字后，我们可以separate

library(dplyr)
library(tidyr)
df1 %>% 
  mutate(fiscal_year_end =  sprintf("%04d", fiscal_year_end)) %>% 
  separate(fiscal_year_end, c("month", "day"), sep= 2)

或在separate中使用负索引

df1 %>% 
  separate(fiscal_year_end, c("month", "day"), sep= -2)

或仅使用 base R，我们使用sub创建分隔符（仅使用单个捕获组）并将其转换为两列data.frame，并使用{{ 1}}

read.csv

Answer 2

使用基数R，我们可以将sub与两个捕获组一起使用，其中第二部分是带有两位数的数字，而第一部分是其他所有内容。

sub("(.*)(\\d+{2}$)", "\\1-\\2", df$fiscal_year_end)

#[1] "12-31" "12-31" "12-31" "12-31" "2-02"  "12-31" "12-31" "2-02"  "12-31"
#     "9-27"  "2-28"  "12-31" .....

Answer 3

另一种过于复杂的方式：

res1<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,1,2),
              substring(my_df$fiscal_year_end,1,1))
res2<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,3,4),
             substring(my_df$fiscal_year_end,2,3))      
paste0(res1,"-",res2)

结果：

[1] "12-31" "12-31" "12-31" "12-31" "2-02"  "12-31" "12-31" "2-02"  "12-31" "9-27"

用`-`分割r中的月份/年份字符串

3 个答案: