鉴于CSV文件在某些字段中具有换行符/返回字符,我们如何解析数据而不将字段拆分为多行。
示例CSV数据:
ID;Name;Country;ISO-2;Address;Latitude;Longitude
022wje3;Europa;Italy;IT;"Viale Kennedy 3
34073 Grado";"45,67960";"13,40070"
024oua5;Hiberia;Italy;IT;"Via XXIV Maggio 8
00187 Rome";"41,89720";"12,48680"
028gupn;Regal Riverside;Hong Kong;HK;"34-36 Tai Chung Kiu Road
Shatin
Hong Kong";"22,38260";"114,19600"
02j7qry;Okaliptus Holiday Villas Apart;Turkey;TR;"Sevket Sabanci Caddesi No. 70
Bahçelievler Mevkii
Turgutreis";"37,02130";"27,25120"
02pc99z;California Apartementos;Spain;ES;"Prat d'en Carbó
43840 Salou";"41,07620";"1,14667"
02tu1jz;Elvis Presley's Heartbreak;United States;US;"3677 Elvis Presley Blvd.
Memphis
Tennessee 38116";"35,04850";"-90,02710"
注意:字段实际上由分号
;
分隔,因为地址可以包含逗号
每行有7个字段,但我们不想错误地将包含换行符的字段中的数据解析为多行...
我们在StackOverflow上找到了几个以Perl为重点的答案:
但我在Perl上有点生疏,并没有发现以JS为中心的答案。
答案 0 :(得分:13)
Ben Nadel试过CSVToArray
吗?
/**
* CSVToArray parses any String of Data including '\r' '\n' characters,
* and returns an array with the rows of data.
* @param {String} CSV_string - the CSV string you need to parse
* @param {String} delimiter - the delimeter used to separate fields of data
* @returns {Array} rows - rows of CSV where first row are column headers
*/
function CSVToArray (CSV_string, delimiter) {
delimiter = (delimiter || ","); // user-supplied delimeter or default comma
var pattern = new RegExp( // regular expression to parse the CSV values.
( // Delimiters:
"(\\" + delimiter + "|\\r?\\n|\\r|^)" +
// Quoted fields.
"(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +
// Standard fields.
"([^\"\\" + delimiter + "\\r\\n]*))"
), "gi"
);
var rows = [[]]; // array to hold our data. First row is column headers.
// array to hold our individual pattern matching groups:
var matches = false; // false if we don't find any matches
// Loop until we no longer find a regular expression match
while (matches = pattern.exec( CSV_string )) {
var matched_delimiter = matches[1]; // Get the matched delimiter
// Check if the delimiter has a length (and is not the start of string)
// and if it matches field delimiter. If not, it is a row delimiter.
if (matched_delimiter.length && matched_delimiter !== delimiter) {
// Since this is a new row of data, add an empty row to the array.
rows.push( [] );
}
var matched_value;
// Once we have eliminated the delimiter, check to see
// what kind of value was captured (quoted or unquoted):
if (matches[2]) { // found quoted value. unescape any double quotes.
matched_value = matches[2].replace(
new RegExp( "\"\"", "g" ), "\""
);
} else { // found a non-quoted value
matched_value = matches[3];
}
// Now that we have our value string, let's add
// it to the data array.
rows[rows.length - 1].push(matched_value);
}
return rows; // Return the parsed data Array
}
在你的情况下用:
调用它var rows = CSVToArray(CSV_string, ';');
其中CSV_string
是您的CSV数据字符串。
答案 1 :(得分:0)
有点晚了,但我希望它能对某人有所帮助。
前一段时间,甚至我遇到了类似的问题,并且在我的有角项目中使用了一个库csvtojson。
您可以使用以下代码将CSV文件作为字符串读取,然后将该字符串传递给csvtojson库,它将为您提供JSON列表。
示例代码:
const csv = require('csvtojson');
if (files && files.length > 0) {
const file: File = files.item(0);
const reader: FileReader = new FileReader();
reader.readAsText(file);
reader.onload = (e) => {
const csvs: string = reader.result as string;
csv({
output: "json",
noheader: false
}).fromString(csvs)
.preFileLine((fileLine, idx) => {
//Convert csv header row to lowercase before parse csv file to json
if (idx === 0) { return fileLine.toLowerCase() }
return fileLine;
})
.then((result) => {
// list of json in result
});
}
}
}