matlab-读取大逗号分隔的文件,其中每个字段用“”括起来

时间:2015-11-05 23:56:33

标签: regex matlab csv import scanf

我想从一个大约有800k行的文本文件中将数据导入Matlab,看起来像这样:


"209","1000",".10500","N/A","36","116","2006-03-16 00:00:00","2519","431.400000","-6.760000","568.600000","142.620000",".000000",".000000",".000000",".000000","2","CHARGEOFF","","","2008-02-16 00:00:00","33.100000"
"190","1000",".18750","N/A","36","116","2006-03-14 00:00:00","0",".000000","-5.230000","1000.000000","269.370000","20.000000","60.000000","4.910000",".000000","4","COMPLETED","","","2009-03-14 00:00:00",".000000"

但是,对于某些条目(上面未显示),逗号是引号内部字符串的一部分。例如,“N,A”。

为简化起见,我把所有文件都删除了,然后我发现某些行的逗号数量不均匀,将数据导入Matlab变得更加困难。

readtable可以导入它,但是它需要太长时间,然后将值存储为字符,例如,不是将209存储为数字,而是将其作为包含内容'209'的字符串导入

谢谢!

2 个答案:

答案 0 :(得分:0)

首先,我将以下字符串保存在文件yourFile.txt中。请注意,N

之间的A
"209","1000",".10500","N,A","36","116","2006-03-16 00:00:00","2519","431.400000","-6.760000","568.600000","142.620000",".000000",".000000",".000000",".000000","2","CHARGEOFF","","","2008-02-16 00:00:00","33.100000"

我首先使用readtext来读取文本文件,如下所示:

fileContents=readtext('yourFile.txt',',"'); %      ," is the delimiter.
% If you want to keep the entries between the quotes as characters.
processedContentChar=cellfun(@(x) regexprep(x,'"',''),fileContents,'uni',0);
% If you want numeric entries, however 'N,A' will be converted to NaN.
processedContentNum=cellfun(@(x) str2double(regexprep(x,'"','')),fileContents,'uni',0); 

答案 1 :(得分:0)

我所做的是使用sed准备数据来查找|,删除它们,替换","与|然后找到"并删除它们。基本上使用Parag上面的答案的想法。


    time sed 's/|//g' file.csv | sed 's/","/|/g' | sed 's/"//g' > file_bar.csv
这需要3.5分钟的~800k行文件,540列

然后在Matlab中我使用readtable指定分隔符作为|那需要10分钟。