我有一些表格的文字:
This is some text, and here's some in "double quotes"
"and here's a double quote:\" and some more", "text that follows"
该文本包含双引号内的字符串,如上所示。双引号可以使用反斜杠(\
)进行转义。在上面,有三个这样的字符串:
"double quotes"
"and here's a double quote:\" and some more"
"text that follows"
要提取这些字符串,我尝试了正则表达式:
"(?:\\"|.)*?"
然而,这给了我以下结果:
>>> preg_match_all('%"(?:\\"|.)*?"%', $msg, $matches)
>>> $matches
[
[ "double quotes",
"and here's a double quote:\",
", "
]
]
如何正确获取字符串?
答案 0 :(得分:2)
这样做的一种方法是涉及到。 lookbehinds:
setwd("/folder/subfolder")
getwd()
"/folder/subfolder"
list.files()
"group.jpg"
"Stake.htm"
stargazer(model, out = "sampleOutput")
jpeg("sample.jpg")
plot(sample, horiz = F)
dev.off()
list.file()
"group.jpg"
"Stake.htm"
<小时/>
".*?(?<!\\)"
中的内容是:
PHP
<小时/> 这产生了
<?php
$text = <<<TEXT
This is some text, and here's some in "double quotes"
"and here's a double quote:\" and some more", "text that follows"
TEXT;
$regex = '~".*?(?<!\\\\)"~';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches);
}
?>
<小时/> 见a demo on regex101.com。 要让它跨越多行,请通过
启用
Array
(
[0] => Array
(
[0] => "double quotes"
[1] => "and here's a double quote:\" and some more"
[2] => "text that follows"
)
)
模式
dotall
答案 1 :(得分:2)
如果你echo
your pattern, you'll see it's indeed passed as %"(?:\"|.)*?"%
到正则表达式解析器。即使是正则表达式解析器,单个反斜杠也将被视为转义字符。
因此,如果模式在单引号内,则需要添加至少一个反斜杠,以将两个反斜杠传递给解析器(一个用于转义backlsash),模式将为:%"(?:\\"|.)*?"%
preg_match_all('%"(?:\\\"|.)*?"%', $msg, $matches);
这仍然不是一个非常有效的模式。问题实际上似乎是duplicate of this one。
有一个better pattern available in this answer(有些人称之为unrolled)。
preg_match_all('%"[^"\\\]*(?:\\\.[^"\\\]*)*"%', $msg, $matches);
See demo at eval.in或将步骤与其他模式in regex101进行比较。
答案 2 :(得分:1)
如果你让正则表达式捕获反斜杠字符作为字符,那么它将终止你的捕获组在&#34; of&#34; (因为前面的\被认为是单个字符)。所以你需要做的是允许\&#34;被捕获,但不是\或&#34;个别。结果是以下正则表达式:
"((?:[^"\\]*(?:\\")*)*)"
详细解释如下:
" begin with a single quote character
( capture only what follows (within " characters)
(?: don't break into separate capture groups
[^"\\]* capture any non-" non-\ characters, any number of times
(?:\\")* capture any \" escape sequences, any number of times
)* allow the previous two groups to occur any number of times, in any order
) end the capture group
" make sure it ends with a "
请注意,在许多语言中,当将正则表达式字符串提供给解析某些文本的方法时,您需要转义反斜杠字符,引号等。在PHP中,上述内容将变为:
'/"((?:[^"\\\\]*(?:\\\\")*)*)"/'