正则表达式匹配除url模式内的空格之外的空格

时间:2013-06-04 03:33:44

标签: regex spaces

您好我是regex的新手,我正在尝试使用它来捕获垃圾中的空格\s{2,},但 NOT 包括"url":"https://x.com/a/C25/XPS - Connection - May 2013.docx"中的空格。目前,我有一个场景,其中url尚未编码,因此它可能包含空格。

示例文字:

"startofjunk      junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

所需文字:

"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

请帮忙。感谢

2 个答案:

答案 0 :(得分:0)

描述

此正则表达式将找到用单个空格替换所有多个空格,并将绕过url部分。在X个空格的序列中,第一个空格被放入组1中,组1作为\1被输入到输出,并忽略其他空格。绕过URL部分,因为如果它作为|或语句的一部分遇到,则会被捕获到组2中,然后由\2替换将其注入输出。

正则表达式:(\s)\s*|("url":"[^"]*"),替换为:\1\2

enter image description here

源字符串

"startofjunk        junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

PHP示例

包含此php示例只是为了显示正则表达式

<?php
$sourcestring="your source string";
echo preg_replace('/(\s)\s*|("url":"[^"]*")/im','\1',$sourcestring);
?>

$sourcestring after replacement:
"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

答案 1 :(得分:0)

使用前瞻声明您的空格在“url”之前出现。也可以使用后视,这样你的整个比赛就是多余的空间:

(?<=\s)\s+(?=.*"url":)

要删除多余的空格,请将整个匹配替换为空白(即没有),或者如果您的应用程序语言允许,删除整个匹配。