第N个字符分隔字符列?

时间:2019-01-29 12:24:18

标签: r dplyr tidyverse tidyr

给出样本df:

df <- structure(list(test_id = c("123-456789123", "785-525135627", 
"6545646545665456", "988898-65464654646464664", "987-656546464", "666-654564654"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

我要将上面的列分为2列:

  1. 该ID的最后N个字符(例如8)
  2. 左前缀

例如,给定N = 8

987-656546464 ---> split to: postfix  prefix
                             56546464 987-6

我尝试了单独的功能来做到这一点:

separate(df, col = test_id, into = c("prefix", "postfix"), sep = "(.{8}$)", convert = T)

但这没有给我第二部分。

请告知。

4 个答案:

答案 0 :(得分:2)

请注意,有问题的df不是data.frame,因此我们将其称为x。然后将其转换为数据帧,并将separatesep = -8一起使用:

library(dplyr)
library(tidyr)
x <- df 

x %>% 
  data.frame %>%
  separate(test_id, into = c("pre", "post"), sep = -8)

给予:

               pre     post
1            123-4 56789123
2            785-5 25135627
3         65456465 45665456
4 988898-654646546 46464664
5            987-6 56546464
6            666-6 54564654

答案 1 :(得分:1)

更新大致答案 df2是df

library(tidyverse)
df2$text_id<-gsub("[-]", "\\1 \\2", df2$test_id)
df2$test_id
df2<-df2 %>% 
  mutate(text_id=str_remove_all(df2$text_id,"\\s"),
         text_id=substr(df2$text_id,1,5))
df2$tesxt_id<-str_replace_all(df2$text_id," ","-")
df2 %>% 
  separate(test_id,c("pre","post"),sep="\\d(?=\\d{8,})",convert = T) %>% 
  select(tesxt_id,post)

结果:

 tesxt_id     post
  <chr>       <int>
1 123-4    56789123
2 785-5    25135627
3 65456          NA
4 98889          NA
5 987-6    56546464
6 666-6    54564654

尝试以下操作:根据需要重命名列。这与前面至少有8位数字的任何数字匹配。我们使用前瞻(?=)来检查数字\\d前面是否有至少8个数字\\d{8,}的数字。

df %>% 
  separate(test_id,c("pre","post"),sep="\\d(?=\\d{8,})",convert = T)

答案 2 :(得分:1)

不使用其他包,而是使用sapplystrsplit(显然,您应该将它们包装到函数中以具有更清晰的语法):

>t(sapply(df[,1],function(i,n){sp=unlist(strsplit(i,""));c(postfix=paste0(sp[(length(sp)-n+1):length(sp)],collapse=""),prefix=paste0(sp[1:(length(sp)-n)],collapse=""))},n=8))
                         postfix    prefix
123-456789123            "56789123" "123-4"
785-525135627            "25135627" "785-5"
6545646545665456         "45665456" "65456465"
988898-65464654646464664 "46464664" "988898-654646546"
987-656546464            "56546464" "987-6"
666-654564654            "54564654" "666-6"

答案 3 :(得分:1)

这是解决我的问题而又不丢失任何数字的原因。 请记住,目标是将8个字符从末尾分开,然后看剩下的内容(最后8个字符的前缀)。 我需要知道数据中后8个字符的唯一前缀是什么。

[RemoteTestNG] detected TestNG version 6.14.2
Jan 30, 2019 9:52:27 AM io.appium.java_client.remote.AppiumCommandExecutor$1 
 lambda$0
 INFO: Detected dialect: OSS
 [Utils] [ERROR] [Error] org.testng.TestNGException: 
 Cannot inject @Test annotated Method [test5_MAP_Assets_Position] with 
 [class java.lang.String].
 For more information on native dependency injection please refer to 
 http://testng.org/doc/documentation-main.html#native-dependency-injection
 at org.testng.internal.Parameters.checkParameterTypes(Parameters.java:407)

 FAILED: test5_MAP_Assets_Position org.testng.TestNGException: 
 Cannot inject @Test annotated Method [test5_MAP_Assets_Position] with [class java.lang.String].
 For more information on native dependency injection please refer to 
 http://testng.org/doc/documentation-main.html#native-dependency-injection