描述

Question

您好我是regex的新手，我正在尝试使用它来捕获垃圾中的空格\s{2,}，但 NOT 包括"url":"https://x.com/a/C25/XPS - Connection - May 2013.docx"中的空格。目前，我有一个场景，其中url尚未编码，因此它可能包含空格。

示例文字：

"startofjunk      junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

所需文字：

"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

请帮忙。感谢

Answer 1

描述

此正则表达式将找到用单个空格替换所有多个空格，并将绕过url部分。在X个空格的序列中，第一个空格被放入组1中，组1作为\1被输入到输出，并忽略其他空格。绕过URL部分，因为如果它作为|或语句的一部分遇到，则会被捕获到组2中，然后由\2替换将其注入输出。

正则表达式：(\s)\s*|("url":"[^"]*")，替换为：\1\2

enter image description here

源字符串

"startofjunk        junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

PHP示例

包含此php示例只是为了显示正则表达式

<?php
$sourcestring="your source string";
echo preg_replace('/(\s)\s*|("url":"[^"]*")/im','\1',$sourcestring);
?>

$sourcestring after replacement:
"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

Answer 2

使用前瞻声明您的空格在“url”之前出现。也可以使用后视，这样你的整个比赛就是多余的空间：

(?<=\s)\s+(?=.*"url":)

要删除多余的空格，请将整个匹配替换为空白（即没有），或者如果您的应用程序语言允许，删除整个匹配。

正则表达式匹配除url模式内的空格之外的空格

2 个答案:

描述

源字符串

PHP示例