在R中的一组模式之后提取数字

时间:2017-09-20 19:12:20

标签: r regex stringr

许多stackoverflow问题都与提取模式后的数字有关。但是,我的任务有点挑战性 我有一个模式列表如下

Customer Id :
C_Id=
CustID=

数据帧的快照如下

Customer Details                   Purchase Amount
Alpha Customer Id:293                    500
C_ID= 495;task based                     788
Detail PurcCustID=789;982 in k          12345

我希望获得如下数据框

Customer Details               Purchase Amount      Customer ID
Alpha Customer Id:293                500                293
C_ID= 495;task based                 788                495
Detail PurcCustID=789;982 in k      12345               789

代码段:

customer_details = c("Alpha Customer Id:293","C_ID= 495;task 
based","DetailPurcCustID=789;982 in k")

purchase_amount = c(500,788,12345)

customer_data = data.frame(customer_details,purchase_amount)

有没有办法完成这项工作

1 个答案:

答案 0 :(得分:2)

我们可以使用str_extract

library(tidyverse)
customer_data %>%
     mutate(CustomerID = as.numeric(str_extract(customer_details, "(?<=I[Dd][:=])\\s*\\d+")))
#               customer_details purchase_amount CustomerID
#1         Alpha Customer Id:293             500        293
#2          C_ID= 495;task based             788        495
#3 DetailPurcCustID=789;982 in k           12345        789

或使用sub

中的base R
customer_data$CustomerID <- as.numeric(sub(".*(I(?i)d[:=]\\s*)(\\d+).*", 
                 "\\2", customer_data$customer_details))