我有一个字符向量,我想在其中匹配特定的字符串,然后将包含该字符串匹配的元素 only 与字符向量中的下一个元素折叠起来,然后允许该过程继续进行直到字符向量结束。例如,仅一种情况:
'"FundSponsor:Blackrock Advisors" "Category:" "Tax-Free Income-Pennsylvania" "Ticker:" "MPA" "NAV Ticker:" "XMPAX" "Average Daily Volume (shares):" "26,000" "Average Daily Volume (USD):" "$0.335M" "Inception Date:" "10/30/1992" "Inception Share Price:" "$15.00" "Inception NAV:" "$14.18" "Tender Offer:" "No" "Term:" "No"'
将包含:
的每个元素与仅跟随其后的元素组合在一起将是很棒的,但是我一直在努力使用粘贴功能,因为它通常会将基于:
的整个矢量折叠为一个元素,这不是我正在寻找的更有针对性的解决方案。
以下是我希望将部分修改后的输出显示为以下示例:
"Inception Share Price:$15.00"
答案 0 :(得分:0)
我不确定您是否希望结果成为一个单一的键:值格式,还是只想清理该长字符串并采用以下格式,即键1:值1键2:值2键3:值3。在这种情况下,您可以通过以下代码来实现:
char = '"FundSponsor:Blackrock Advisors" "Category:" "Tax-Free Income-Pennsylvania" "Ticker:" "MPA" "NAV Ticker:" "XMPAX" "Average Daily Volume (shares):" "26,000" "Average Daily Volume (USD):" "$0.335M" "Inception Date:" "10/30/1992" "Inception Share Price:" "$15.00" "Inception NAV:" "$14.18" "Tender Offer:" "No" "Term:" "No"'
char_tidy = gsub('\\" \\"', " ", char)
# output is below
> char_tidy
[1] "\"FundSponsor:Blackrock Advisors Category: Tax-Free Income-Pennsylvania Ticker: MPA NAV Ticker: XMPAX Average Daily Volume (shares): 26,000 Average Daily Volume (USD): $0.335M Inception Date: 10/30/1992 Inception Share Price: $15.00 Inception NAV: $14.18 Tender Offer: No Term: No\""
答案 1 :(得分:0)
以下内容可能会有所帮助:
首先使用strsplit
进行拆分,然后将属于一起的元素绑定
# split the string
vec <- unlist(strsplit(string, '(?=\")(?=\")', perl = TRUE))
vec <- vec[! vec %in% c(' ', '\"')]
# that's how vec looks like right now
head(vec)
# [1] "FundSponsor:Blackrock Advisors" "Category:" "Tax-Free Income-Pennsylvania" "Ticker:" "MPA"
# [6] "NAV Ticker:"
#
# now paste the elements
ind <- grepl(':.+',vec)
tmp <- vec[!ind]
vec[!ind] <- paste0(tmp[seq(1,length(tmp),2)], tmp[seq(2,length(tmp),2)])
head(vec)
# [1] "FundSponsor:Blackrock Advisors" "Category:Tax-Free Income-Pennsylvania" "Ticker:MPA" "NAV Ticker:XMPAX"
# [5] "Average Daily Volume (shares):26,000" "Average Daily Volume (USD):$0.335M"
与数据
string = "\"FundSponsor:Blackrock Advisors\" \"Category:\" \"Tax-Free Income-Pennsylvania\" \"Ticker:\" \"MPA\" \"NAV Ticker:\" \"XMPAX\" \"Average Daily Volume (shares):\" \"26,000\" \"Average Daily Volume (USD):\" \"$0.335M\" \"Inception Date:\" \"10/30/1992\" \"Inception Share Price:\" \"$15.00\" \"Inception NAV:\" \"$14.18\" \"Tender Offer:\" \"No\" \"Term:\" \"No\""
说明
regex
(?=\")(?=\")
基本上告诉R
每当有两个\"
时就拆分字符串。语法(?!*something*)
表示*something*
在之前/之后。因此,上面的代码简单地写成:在\"
之前和\"
前面的每个位置处分割字符串。strsplit(...)
创建的格式为\"
和
的元素('\"Category:\" \"...'
成为向量'\"';'Category:';'\"';' ';'...'
)。因此,通过使用! vec %in% c(...)
,我们可以删除那些不需要的元素。附录
如果包含格式为"string:"
后跟" "
的元素,则在上面的代码中删除行vec <- vec[! vec %in% c(' ', '\"')]
并添加行
vec <- vec[seq(2L, length(vec), 4L)]
vec[vec == ' '] <- NA_character_