我正在编写一个节点模块,它接受csv文件并将其转换为javascript对象。因为我允许用户指定分隔符,并支持文本限定符,所以我需要用动态正则表达式解析它。
以下是我创建正则表达式的方法:
settings.dilemeter = escapeForRegex(settings.dilemeter);
settings.textQualifier = escapeForRegex(settings.textQualifier);
var d = settings.dilemeter;
var tq = settings.textQualifier;
///////////////////////////////////////////////////////////////
/// This appears to be glitched
///////////////////////////////////////////////////////////////
var searchArray = [
"(" + tq + d + tq + ")", // First case to search for, eg: ","
"(" + tq + d + ")", // Second case to search for, eg: ",
"(" + d + tq + ")", // Third case to search for, eg: ,"
"(" + d + ")", // Last case to search for, eg: ,
"(" + tq + "$)", // if the text qualifier is the very last thing
];
var regexString = "(" + searchArray.join('|') + ')';
console.log(regexString);
var regex = new RegExp(regexString);
它产生一个如下所示的正则表达式(当使用|和“作为dilemeters和文本限定符时)(("\|")|("\|)|(\|")|(\|)|("$))
然而,当我使用string.split(regex)
运行时,我得到了非常奇怪的结果。
var testString = [
'h1|h2|h3|h4', // The first line will be the headers
'value 1|"Value 2"|value 3|"value - 5"'// This is the first row of data
];
console.log(testString[1].split(regex));
产生:
["value 1",
"|"",
undefined,
undefined,
"|"",
undefined,
undefined,
"Value 2",
""|",
undefined,
""|",
undefined,
undefined,
undefined,
"value 3",
"|"",
undefined,
undefined,
"|"",
undefined,
undefined,
"value - 5",
""",
undefined,
undefined,
undefined,
undefined,
""",
""]
我似乎无法弄清楚为什么所有这些都是未定义的以及为什么它会返回我想要拆分的项目。
我创建了一个具有更完整上下文演示http://plnkr.co/edit/hn2GUFYodYQeuQLqqwVD?p=preview
的plunker答案 0 :(得分:2)
string.split(regexp)
返回正则表达式中所有捕获组的条目。如果您需要regexp中的组,但不希望它们包含在结果中,请使用非捕获组。这些表示为在组的左括号之后放置?:
:
var searchArray = [
"(?:" + tq + d + tq + ")", // First case to search for, eg: ","
"(?:" + tq + d + ")", // Second case to search for, eg: ",
"(?:" + d + tq + ")", // Third case to search for, eg: ,"
"(?:" + d + ")", // Last case to search for, eg: ,
"(?:" + tq + "$)", // if the text qualifier is the very last thing
];
var regexString = "(?:" + searchArray.join('|') + ')';