Question

我有以下一行

    x<-"CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:"

我想提取＆＃34; CUST_Id_8＆＃34;，＆＃34;先生。 Praveen Kumar＆＃34;以及在DOB之后写的任何内容：母亲的名字：联系人Num：等等，存储在客户ID，姓名，DOB等变量中。

请帮忙。

我用过

    strsplit(x, ":")

但结果是包含文本的列表。但如果变量名后面没有任何内容，我需要空白。

any1可以告诉如何在两个单词之间提取字符串。就像我想提取＆＃34;先生。 Praveen Kumar＆＃34;在Name：和DOB之间

Answer 1

您可以使用regexec和regmatches将各种数据项拉出为子字符串。这是一个有效的例子：

示例数据

txt <- c("CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:",
         "CUST_Id_15Name:Mr.Joe JohnsonDOB:01/02/1973Mother's Name:BarbaraContact Num:0123 456789Email address:joe@joesville.comOwns Car:YesProducts held with Bank:Savings, CurrentCompany Name:Joes villeSalary per. month:$100000Background:shady")

要匹配的模式：

pattern <- "CUST_Id_(.*)Name:(.*)DOB:(.*)Mother's Name:(.*)Contact Num:(.*)Email address:(.*)Owns Car:(.*)Products held with Bank:(.*)Company Name:(.*)Salary per. month:(.*)Background:(.*)"
var_names <- strsplit(pattern, "[:_]\\(\\.\\*\\)")[[1]]

运行匹配：

data <- as.data.frame(do.call("rbind", regmatches(txt, regexec(pattern, txt))))[, -1]
colnames(data) <- var_names

输出：

#  CUST_Id             Name        DOB Mother's Name Contact Num
#1       8 Mr.Praveen Kumar                                     
#2      15   Mr.Joe Johnson 01/02/1973       Barbara 0123 456789
#      Email address Owns Car Products held with Bank Company Name
#1                                                                
#2 joe@joesville.com      Yes        Savings, Current   Joes ville
#  Salary per. month Background
#1                             
#2           $100000      shady

Answer 2

如果您事先知道密钥，则可以提取如下值：

keys <- c("CUST_Id_8Name", "DOB", "Mother's Name", 
  "Contact Num", "Email address", "Owns Car", "Products held with Bank", 
  "Company Name", "Salary per. month", "Background")
cbind(keys, values = sub("^:", "", strsplit(x, paste0(keys, collapse = "|"))[[1]][-1]))
#                 keys                      values            
# [1,] "CUST_Id_8Name"           "Mr.Praveen Kumar"
# [2,] "DOB"                     ""                
# [3,] "Mother's Name"           ""                
# [4,] "Contact Num"             ""                
# [5,] "Email address"           ""                
# [6,] "Owns Car"                ""                
# [7,] "Products held with Bank" ""                
# [8,] "Company Name"            ""                
# [9,] "Salary per. month"       ""                
# [10,] "Background"              ""

在R

2 个答案: