我希望通过匹配模式
将列拆分为多个列test <- data.frame("id" = c("Albertson's Inc.","Albertson's Inc."), "V3" = c("Reiterates FY 2004, Significant Developments, 2 June 2004, 53 words, (English)(Document MULTI00020050122e06201fkk)","EBITDA Hits Four Year Low, Stock Diagnostics, 16:00 GMT, 9 June 2004, 245 words, (English)(Document STODIA0020040609e0690006g)"), stringsAsFactors = F)
到目前为止,我用来获得所需结果的代码就像
library(stringr)
df <- as.data.frame(str_match(test$V3, "^(.*)GMT,(.*),(.*)words,(.*)Document (.*)$")[,-1], stringsAsFactors = F)
我对上述代码有两个问题 首先,当GMT丢失时,它没有显示结果,其次我想要&#34; id&#34;输出df中的列,我应该使用任何建议或不同方法的结果,请分享感谢所有版主程序员为这样一个有用的论坛。
答案 0 :(得分:1)
不是100%肯定你的&#34; GTM&#34;问题。这是我的尝试:
你的代表资料:
test <- data.frame("id" = c("Albertson's Inc.","Albertson's Inc."), "V3" = c("Reiterates FY 2004, Significant Developments, 2 June 2004, 53 words, (English)(Document MULTI00020050122e06201fkk)","EBITDA Hits Four Year Low, Stock Diagnostics, 16:00 GMT, 9 June 2004, 245 words, (English)(Document STODIA0020040609e0690006g)"), stringsAsFactors = F)
代码:
library(tidyverse)
test$V3 %>% map(~str_split(.,",(?!\\s*\\d{1,2}:\\d{1,2})|(?<=\\))(?=\\()") %>% unlist %>% trimws) %>%
do.call(rbind,.) %>%
cbind(test["id"],.)
结果:
# id 1 2 3 4 5 6
# 1 Albertson's Inc. Reiterates FY 2004 Significant Developments 2 June 2004 53 words (English) (Document MULTI00020050122e06201fkk)
# 2 Albertson's Inc. EBITDA Hits Four Year Low Stock Diagnostics, 16:00 GMT 9 June 2004 245 words (English) (Document STODIA0020040609e0690006g)