如何构造正则表达式来拆分此文本?

时间:2016-06-01 17:30:17

标签: javascript html regex

大家好我正在写一个脚本,主要的想法是我有一个固定结构的文本如下:

"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"

我想处理该文本,我想通过以下符号拆分该文本: |“〜,管道双引号和〜,我想创建一个数组来存储这些值,如下所示:

splitWords = [RBD,X,RBD,C,92173,GJHGWO.NAYE,SAMBORNSiPOSSSTHRa]

为了实现它,我尝试了:

var splitWords = document.getElementById("texto").value.split("|");
document.write(stringArray.toString());

我得到了:

"RBD,X,RBD,C,92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"

这个问题是,这只是通过管道分割文本,我想将其他符号拆分,以获得我想要的输出。 完整的代码如下:

<!DOCTYPE html>
<html>

<body>
<p id="demo"></p>

<textarea cols=150 rows=15 id="texto">
"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"
</textarea>

<script>
var splitWords = document.getElementById("texto").value.split("|");
document.write(splitWords.toString());
</script>

</body>
</html>

我想表达任何关于实现这一目标的正则表达式的建议。

3 个答案:

答案 0 :(得分:3)

使用正则表达式:

str = '"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"';
str.split(/[\|"~\s]+/).filter(Boolean); // Output: ["RBD", "X", "RBD", "C", "92173", "GJHGWO.NAYE", "SAMBORNSiPOSSSTHRa"]

如果你想过滤这段时间,可以用正斜面添加反斜杠来避开它。

答案 1 :(得分:2)

好的,让我们开始......获取textarea值并修剪它......

var splitWords = document.getElementById("texto").value.trim();

首先,您需要替换"符号...

splitWords = splitWords.replace(/"/g, '');

然后分割线条,因为它就像表格行......

splitWords = splitWords.split('\n');

然后通过posible delimeters |~ ...

分割每一行
splitWords.forEach(function(rowValue,rowIndex) {
    splitWords[rowIndex] = rowValue.split(/[|~ ]/);
    console.log(rowIndex, splitWords[rowIndex]);
});

Console.log输出将是:

0 ["RBD", "X", "RBD", "C", "92173", "GJHGWO.NAYE", "SAMBORNSiPOSSSTHRa"]
1 ["RBD", "X", "RBD", "C", "92173", "GJHGX4.NAYE", "SAMBORNSiPOSSSTHRa"]
2 ["RBD", "X", "RBD", "C", "92173", "GJHGX6.NAYE", "SAMBORNSiPOSSSTHRa"]
3 ["RBD", "X", "RBD", "C", "92173", "GJHGX8.NAYE", "SAMBORNSiPOSSSTHRa"]
4 ["RBD", "X", "RBD", "C", "92173", "GJHGXA.NAYE", "SAMBORNSiPOSSSTHRa"]
5 ["RBD", "X", "RBD", "C", "92173", "GJHGXC.NAYE", "SAMBORNSiPOSSSTHRa"]

然后用二维数组splitWords做任何你想做的事......

答案 2 :(得分:1)

我的建议是:

<p id="demo"></p>

<textarea cols=150 rows=15 id="texto">
"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"
</textarea>

<script>
    var lines = document.getElementById("texto").value.split('\n');
    var splitWords  = lines.filter(function(v) { return v.length > 0})
                           .map(function(currentValue, index) {
        return currentValue.trim().replace(/^"([^"]+)"\s"([^"]+)"$/, '$1$2').split(/[|~]/);
    });
    console.log(JSON.stringify(splitWords, null, 4));
</script>