RegEx从多行字符串中捕获子字符串并将其转换为对象数据结构

时间:2018-12-16 02:56:10

标签: javascript regex

我有一个这样的多行字符串:

url=www.website-one.com/number&id=2222&key=rer
type=web
version=3
url=www.website-two.com/number&id=9999&key=abc
type=web
version=5

该列表可以包含更多行,即带有URL=的更多行。

目标是获取此JavaScript对象结构:

{
    "url": "www.website-one.com/number&id=2222&key=rer",
    "id": "2222",
},
{
    "url": "www.website-two.com/number&id=9999&key=abc",
    "id": "9999",
}

如何用JavaScript实现呢?

如果在列表中找到更多URL=,则JSON也应包含这些内容。

我将如何解决它:

  1. 使用RegEx从列表中捕获URL,仅在列表中显示URL
  2. 然后使用RegEx从每一行捕获,条件是:在url=之后获取所有内容-将其写入名为URL的字符串中
  3. 然后使用以下条件捕获字符串URL RegEx的副本中:获取ID=之后直到&的所有内容

我已经尝试过的东西

function start_regex() {
    var str = `
    url=www.website-one.com/number&id=2222&key=rer
    type=web
    version=3
    url=www.website-two.com/number&id=9999&key=abc
    type=web
    version=5
  `;

    //regex show only URL until end of line
    var RegExpURLonly = /url=.*([^\n]+)/g;
    var URLonly = RegExpURLonly.exec(str);


    while (URLonly != null) {
        console.log('This is URL only: ' + URLonly[1]);
        URLonly = RegExpURLonly.exec(str);
    }

    //regex show only id
    var RegExIDonly = /id=([^&]+)/g;
    var IDonly = RegExIDonly.exec(URLonly);


    while (IDonly != null) {
        console.log('This is ID only: ' + IDonly[1]);
        IDonly = RegExIDonly.exec(str); 
    }
}

1 个答案:

答案 0 :(得分:1)

从空格分割开始可能更容易:

str.split(/\s+/)

然后,您可以过滤结果数组,查找以url=开头的行,这将为您提供网址数组:

var str = `
    url=www.website-one.com/number&id=2222&key=rer
    type=web
    version=3
    url=www.website-two.com/number&id=9999&key=abc
    type=web
    version=5
  `;

let urls = str.split(/\s+/).filter(s => s.startsWith('url='))
console.log(urls)

然后,您可以在数组上map() id并返回具有url和id的对象:

var str = `
    url=www.website-one.com/number?key=rer&id=2222
    type=web
    version=3
    url=www.website-two.com/number&id=9999&key=abc
    type=web
    version=5
  `;

let urls = str.split(/\s+/).filter(s => s.startsWith('url='))
let result = urls.map(url => {
    let regex = /[?&]id=(.+?)(&.*)?$/g
    let id = regex.exec(url)
    if (id) id = id[1]
    return {url:url.slice(4), id}     // slice(4) to remove the initial "url="
})
console.log(urls)
console.log(result)