Question

我想知道我们应该如何读取由字符串，双精度和字符等组成的复杂csv文件。

例如，您能否提供一个可以在此csv文件中提取数值的成功命令？

点击here。

例如：

yield curve data 2013-10-04     
Yields in percentages per annum.        


Parameters - AAA-rated bonds        
Series key   Parameters  Description
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA0  2.03555 Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Beta 0 - Euro, provided by ECB
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA1  -2.009068   Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Beta 1 - Euro, provided by ECB
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA2  24.54184    Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Beta 2 - Euro, provided by ECB
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA3  -21.80556   Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Beta 3 - Euro, provided by ECB
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.TAU1   5.351378    Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Tau 1 - Euro, provided by ECB
YC.B.U2.EUR.4F.G_N_A.SV_C_YM.TAU2   4.321162    Euro area (changing composition) - Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous compounding - yield error minimisation - Yield curve parameters, Tau 2 - Euro, provided by ECB

这些是文件中信息的一部分。我尝试csvread('yc_latest.csv', 6, 1, [6,1,6,1])获取值2.03555，但它给了我以下错误：

   Error using dlmread (line 139)
    Mismatch between file and format string.
    Trouble reading number from file (row 1u, field 3u) ==> "Euro area (changing composition) -
    Government bond, nominal, all issuers whose rating is triple A - Svensson model - continuous
    compounding - yield error minimisation - Yield curve parameters, Beta 0

    Error in csvread (line 50)
        m=dlmread(filename, ',', r, c, rng);

Answer 1

我强烈建议您使用matlab中的“导入数据”功能（它位于“HOME”工具栏中）。

特别注意截图中它还可以为您生成代码，以便将来自动化。 enter image description here

Answer 2

这是一个非常糟糕的解决方案。不幸的是，Matlab在读取csv文件时非常震撼，这使得这种hackery成为一种不幸的必需品。从好的方面来说，你可能只需要编写一次这样的代码。

fid = fopen('yc_latest.csv');   %// open the file

%// parse as csv, skipping the first six lines
contents = textscan(fid, '%s %f %[^\n]', 'HeaderLines', 6); 

%// unpack the fields and give them meaningful names
[seriesKey, parameters, description]   = contents{:};

fclose(fid);                    %// don't forget this!

Answer 3

Chris解决方案的替代方案：

fid=fopen('yc_latest.csv');
Rows = textscan(fid,'%s', 'delimiter','\n'); %Creates a temporary cell array with the rows
fclose(fid);

%looks for the lines with a euro value:
value=strfind(Rows,'Euro'); 
Idx = find(~cellfun('isempty', value)); 

Columns= cellfun(@(x) textscan(x,'%f','delimiter','\t','CollectOutput',1), Rows);
Columns= cellfun(@transpose, Columns, 'UniformOutput', 0);

具有实际欧元值的所有行的索引存储在Idx中。

Answer 4

您可能希望以这种方式使用textscan。

每行都使用常规分隔符（制表符，空格）进行解析，使用的格式为%*s，带有星号以跳过第一个元素（YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA0 ），然后%f获取感兴趣的值，最后%*[^\n]跳过剩余的行。

fid = fopen(filename);                                
C = textscan(fid, '%*s%f%*[^\n]', 'HeaderLines', 6); 
fclose(fid);

values   = C{1};

如何将复杂的csv文件导入数值向量到Matlab中

4 个答案: