如何使用Javascript和正则表达式解析网址?

时间:2014-01-24 05:02:45

标签: javascript regex url

我想解析一些具有以下格式的网址: -

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a" 

域名和其他部分对于所有网址都没有必要,它们可以有所不同,即我正在寻找一般解决方案。

基本上我想剥离所有其他东西,只获得部分:

/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p

我想用JavaScript和正则表达式来解析它

我这样做:

var mapObj = {"/^(http:\/\/)?.*?\//":"","(&mycracker.+)":"","(&ref.+)":""};
var re = new RegExp(Object.keys(mapObj).join("|"),"gi");
url = url.replace(re, function(matched){
  return mapObj[matched];
}); 

但它返回了这个

http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43pundefined

我在哪里做的不正确?或者是否有另一种解决方案更简单?

4 个答案:

答案 0 :(得分:2)

您可以使用:

/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/

代码:

var s="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var ss=/(?:https?:\/\/[^\/]*)(\/.*?)(?=\&mycracker)/;
console.log(s.match(ss)[1]);

<强> Demo

Fiddle Demo

说明:

explanation

答案 1 :(得分:1)

为什么不直接映射分割数组?

您不需要对URL进行正则表达式,但是您必须在循环内运行if语句以从中删除特定的GET参数。在这种特殊情况下(特定关键词)你只需要子串直到indexOf“&amp; mycracker”

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a" 
var x = url.split("/");
var y = [];
x.map(function(data,index) { if (index >= 3) y.push(data); });
var path = "/"+y.join("/");
path = path.substring(0,path.indexOf("&mycracker"));

答案 2 :(得分:1)

稍微更改以下代码,您可以检索任何参数:

var url = "http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"
var re = new RegExp(/http:\/\/[^?]+/);
var part1 = url.match(re);
var remain = url.replace(re, '');
//alert('Part1: ' + part1);
var rf = remain.split('&');
// alert('Part2: ' + rf);
var part2 = '';
for (var i = 0; i < rf.length; i++) 
    if (rf[i].match(/(p%5B%5D|sid)=/))
        part2 += rf[i] + '&';
part2 = part2.replace(/&$/, '');
//alert(part2)
url = part1 + part2;
alert(url);

答案 3 :(得分:0)

var url ="http://www.example.com/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a";
var newAddr = url.substr(22,url.length);
// newAddr == "/cooks/cooking-dress-wine/~no-order/pr?p%5B%5D=sort%3Dfeatured&sid=bks%2C43p&mycracker=ch_vn_clothing_subcategory_Puma&ref=b41c8097-8efe-4acf-8919-0fa81bcb590a"

22是从哪里开始切割字符串。

url.length包括多少内容。

只要链接上的域名保持不变,此功能就可以使用。