匹配并提取r中的子串

时间:2017-11-26 06:44:47

标签: r string pattern-matching extract stringr

我在字符中逐行显示文本数据,这是所有字符串。

[1]"1128=9,9=282,35=X,34=4846318,52=20140107224500037,34=20140107,268=3,279=0,22=8,48=637548,83=585590,107=ZCH4,269=4,270=425,273=224500000,286=5,279=0,22=8,48=637548,83=585591,107=ZCH4,269=E,273=425.5,273=224500000,279=0,273=8,48=637548,34=585592,107=ZCH4,269=F,270=425,271=100,273=224500000,10=144"
[2]"1128=9,9=467,35=X,34=4846344,52=20140107224500107,75=20140108,268=5,279=0,22=8,48=772825,279=0,22=8,48=692825,83=434250,107=ZCZ4,269=E,270=453,271=41,273=224500000,279=0,22=8,48=692007,83=434251,107=ZCZ4,269=F,270=452.75,273=224500000,279=0,22=8,48=35213,83=434252274=2,336=0,451=0.25,279=1,22=8,48=692825,83=434253,107=ZCZ4,269=1,270=453,271=51,273=224500000,336=0,346=17,1023=1,10=239"

我想截断数据,只提取以“48 =”和“34 =”开头的子串,

我目前的代码是:

ex_between(data, c('48=', '34='), c(',', ','), extract=TRUE)

它有效,但它也会截断我要保留的“48 =”和“34 =”部分。

期望的结果:

[1]"34=4846318,34=20140107,48=637548,48=637548,48=637548,34=585592"
[2]34=4846344,48=772825,48=692825,48=692007,48=35213,48=692825"

截断数据中元素“34 = ....”和“48 = ....”的顺序需要与原始数据中的顺序相同。

2 个答案:

答案 0 :(得分:2)

怎么样:

var selector1 = $('.selected-content1'); // you can declare this in Global
var selector2 = $('.selected-content2'); // you can declare this in Global
$('#embed-media').load("https://LOADURL.com selector1 , selector2", function() 
 { 

 });

答案 1 :(得分:1)

您还可以使用(?<=,|^)(?:48|34)=[^,]*等PCRE正则表达式提取所需的值,然后使用sapply collapse将找到的匹配项提取到,以构建最终结果:

x <- c("1128=9,9=282,35=X,34=4846318,52=20140107224500037,34=20140107,268=3,279=0,22=8,48=637548,83=585590,107=ZCH4,269=4,270=425,273=224500000,286=5,279=0,22=8,48=637548,83=585591,107=ZCH4,269=E,273=425.5,273=224500000,279=0,273=8,48=637548,34=585592,107=ZCH4,269=F,270=425,271=100,273=224500000,10=144", "1128=9,9=467,35=X,34=4846344,52=20140107224500107,75=20140108,268=5,279=0,22=8,48=772825,279=0,22=8,48=692825,83=434250,107=ZCZ4,269=E,270=453,271=41,273=224500000,279=0,22=8,48=692007,83=434251,107=ZCZ4,269=F,270=452.75,273=224500000,279=0,22=8,48=35213,83=434252274=2,336=0,451=0.25,279=1,22=8,48=692825,83=434253,107=ZCZ4,269=1,270=453,271=51,273=224500000,336=0,346=17,1023=1,10=239")
m <- regmatches(x, gregexpr("(?<=,|^)(?:48|34)=[^,]*", x, perl=TRUE))
sapply(m, function(x) paste(x, collapse=","))
# => [1] "34=4846318,34=20140107,48=637548,48=637548,48=637548,34=585592"
# => [2] "34=4846344,48=772825,48=692825,48=692007,48=35213,48=692825" 

请参阅R demo online

模式详情

  • (?<=,|^) - 当前位置的左侧必须有,或字符串的开头(这是一个积极的外观构造,这就是为什么perl=TRUEgregexpr需要提取输入中的所有匹配项
  • (?:48|34) - 4834
  • = - 等号
  • [^,]* - 除,以外的0 +字符。