Question

我有一个以下格式的字符串：

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire),
       StartDate=2015-05-20, EndDate=2015-05-20, performance=best")

我的目标是在数据框中得到最终结果如下：

first_name   cust_id   start_date    end_date    performance           cust_notes
 James(Mr)     98503   2015-05-20  2015-05-20           best   ZZW_LG,WGE,zonaire

我运行了以下代码：

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire),
       StartDate=2015-05-20, EndDate=2015-05-20, performance=best")

split_by_comma <- strsplit(a,",")

split_by_equal <- lapply(split_by_comma,strsplit,"=")

由于custid有额外的逗号和括号，我没有得到理想的结果。

请注意，名字中的括号是真实的，需要原样。

Answer 1

你需要拆分。

,(?![^()]*\\))

您需要lookahead。这不会在,内按()分割。请参阅演示。

https://regex101.com/r/uF4oY4/82

要获得所需的结果，请使用

split_by_comma <- strsplit(a,",(?![^()]*\\))",perl=TRUE)

split_by_equal <- lapply(split_by_comma,strsplit,"=")

Answer 2

如果您的字符串格式为true，这可能是一个快速解决方案：

library(httr)

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire), StartDate=2015-05-20, 
        EndDate=2015-05-20, performance=best")

dat <- data.frame(parse_url(sprintf("?%s", gsub(",[[:space:]]+", "&", a)))$query, 
           stringsAsFactors=FALSE)

library(tidyr)
library(dplyr)

mutate(separate(dat, cust_id, into=c("cust_id", "cust_notes"), sep="\\("), 
       cust_notes=gsub("\\)", "", cust_notes))

##   first_name cust_id         cust_notes  StartDate    EndDate performance
## 1  James(Mr)   98503 ZZW_LG,WGE,zonaire 2015-05-20 2015-05-20        best

外推：

gsub(",[[:space:]]+", "&", a)使参数看起来像是URL查询字符串的组成部分。
sprintf(…)使其看起来像一个实际的查询字符串
parse_url（来自httr）会将键/值对分开并将其粘贴在返回列表中的列表（名为query）中
data.frame会好的......
separate会将cust_id的{{1}}列拆分为两列
(将删除新mutate列

)

这里的整个事情是＆＃34;管道＆＃34;：

cust_notes

与外推匹配，并且（IMO）更容易遵循。

Answer 3

迟到的回复，但发布了它，因为它非常简单易懂，无需使用任何其他软件包

rawdf = read.csv("<your file path>", header = F, sep = ",", stringsAsFactors = F)
# Get the first row of the dataframe and transpose it into a column of a df
colnames = data.frame(t(rawdf[1,]))

# Split the values of the single column df created above into its key value
# pairs which are separated by '=' and save in a vector
colnames = unlist(strsplit(as.character(colnames$X1), "="))

# Pick up all the odd indexed values from the above vector (all odd places
# are colnames and even places the values associated with them)
colnames = colnames[seq(1,length(colnames),2)]

# Assign the extracted column names from the vector above to your original data frame
colnames(rawdf) = colnames

# Use the regex to extract the value in each field of the original df by
# replacing the 'Key=' pattern present in each field with an empty string 
for(i in 1:dim(rawdf)[2]) rawdf[,i] = gsub(paste(colnames[i],"=",sep=""), "", rawdf[,i])

如何解析R中具有多个条件的url字符串的键值对

3 个答案: