使用javascript和regex限制特定html标记内的替换和捕获

时间:2016-11-29 21:26:45

标签: javascript regex

我使用javascript和正则表达式转换此HTML代码:

  <html>
        <body>
              <document>
                     ==id:firstid;
                     ==href:#anchor32145;
                     ==href:#anchor31274;
                     ==href:#anchor98751;
              </document>
              <document>
                     ==id:secondid;
                     ==href:#anchor62341;
              </document>
              <document>
                     ==id:thirdid;
                     ==href:#achor52153;
                     ==href:#anchor98421;
              </document>
        </body>
  </html>

采用以下格式:

  <html>
        <body>
              <document>
                     ==id:firstid;
                     ==href: firstid #anchor32145;
                     ==href: firstid #anchor31274;
                     ==href: firstid #anchor98751;
              </document>
              <document>
                     ==id:secondid;
                     ==href: secondid #anchor62341;
              </document>
              <document>
                     ==id:thirdid;
                     ==href: thirdid #anchor52153;
                     ==href: thirdid #anchor98421;
              </document>
        </body>
  </html>

正如您所看到的,我尝试做的是将== id:的值分配给all = href:在同一个文档标记内。我对javascript相对较新,所以任何帮助实现这一点都将非常感激。

1 个答案:

答案 0 :(得分:2)

除了直接正则表达式(例如XML解析器)之外,还有更好的方法可以做到这一点 但既然你要求它......

&#13;
&#13;
const str =
  `  <html>
        <body>
              <document>
                     ==id:firstid;
                     ==href:#anchor32145;
                     ==href:#anchor31274;
                     ==href:#anchor98751;
              </document>
              <document>
                     ==id:secondid;
                     ==href:#anchor62341;
              </document>
              <document>
                     ==id:thirdid;
                     ==href:#achor52153;
                     ==href:#anchor98421;
              </document>
        </body>
  </html>`

const str2 = str.replace(/(?:<document>)[\s=\w]*:(\w*);[a-z0-9;:#\s=]*/g, (m, m1, m2) =>
  m.replace(/==href:/g, (mm) => ` ${mm} ${m1} `))
console.log(str2)
&#13;
&#13;
&#13;