我正在寻找一个输出文本中所有引文的 SimpleGrepSedPerlOrPythonOneLiner 。
示例1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
标准输出:
"HAL,"
"said that everything was going extremely well.”
示例2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
标准输出:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"
等
答案 0 :(得分:7)
我喜欢这个:
perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'
它有点冗长,但它比最简单的实现更好地处理转义引用和回溯。它的意思是:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch
[^"\\] # a character that is not a dquote or a backslash
| \\+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| \\ # OR again a backslash
(?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
如果你不需要那么大的力量 - 说它只是对话而不是结构化的引号,那么
/"([^"]*)"/
可能与其他任何事情一样有效。
答案 1 :(得分:5)
如果您有嵌套引号,则无法使用正则表达式解决方案,但对于您的示例,此方法效果很好
$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"
| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"
$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
答案 2 :(得分:4)
grep -o "\"[^\"]*\""
"
+除了报价之外的任何内容,任意次数+ "
-o使它只输出匹配的文本,而不是整行。
答案 3 :(得分:0)
grep -o '"[^"]*"' file
选项'-o'仅打印图案