测试;工作： -

Question

我在使用引号解析CSV数据时遇到了一些问题。我的主要问题是字段中的引号。在以下示例中，第1-4行正常工作，但5,6和7不工作。

COLLOQ_TYPE,COLLOQ_NAME,COLLOQ_CODE,XDATA
S,"BELT,FAN",003541547,
S,"BELT V,FAN",000324244,
S,SHROUD SPRING SCREW,000868265,
S,"D" REL VALVE ASSY,000771881,
S,"YBELT,"V"",000323030,
S,"YBELT,'V'",000322933,

我想避免使用Text :: CSV，因为它没有安装在目标服务器上。意识到CSV比他们看起来更复杂，我正在使用Perl Cookbook中的食谱。

sub parse_csv {
  my $text = shift; #record containg CSVs
  my @columns = ();
  push(@columns ,$+) while $text =~ m{
    # The first part groups the phrase inside quotes
    "([^\"\\]*(?:\\.[^\"\\]*)*)",?
      | ([^,]+),?
      | ,
    }gx;
  push(@columns ,undef) if substr($text, -1,1) eq ',';
  return @columns ; # list of vars that was comma separated.
}

有没有人建议改进正则表达式以处理上述情况？

Answer 1

请尝试使用CPAN

您无法下载Text::CSV的副本或任何其他基于非XS的CSV解析器实现，并将其安装在您的本地目录或项目的lib / sub目录中所以它随着你的项目推出一起安装。

如果您无法在项目中存储文本文件，那么我想知道您是如何编写项目的。

http://novosial.org/perl/life-with-cpan/non-root/

应该是如何在当地使这些进入工作状态的良好指南。

不使用CPAN确实是一种灾难。

在尝试编写自己的CSV实现之前，请考虑这一点。

Text::CSV超过一百行代码，包括修复的错误和边缘情况，从头开始重新编写这些内容只会让您了解CSV可能是多么糟糕。

_{注意：我很难学到这一点。花了一整天的时间才能在PHP中找到一个有效的CSV解析器，之后我发现在以后的版本中添加了一个内置的解析器。这真的很糟糕。}

Answer 2

您可以使用Perl附带的Text::ParseWords来解析CSV。

use Text::ParseWords;

while (<DATA>) {
    chomp;
    my @f = quotewords ',', 0, $_;
    say join ":" => @f;
}

__DATA__
COLLOQ_TYPE,COLLOQ_NAME,COLLOQ_CODE,XDATA
S,"BELT,FAN",003541547,
S,"BELT V,FAN",000324244,
S,SHROUD SPRING SCREW,000868265,
S,"D" REL VALVE ASSY,000771881,
S,"YBELT,"V"",000323030,
S,"YBELT,'V'",000322933,

正确解析您的CSV ....

# => COLLOQ_TYPE:COLLOQ_NAME:COLLOQ_CODE:XDATA
# => S:BELT,FAN:003541547:
# => S:BELT V,FAN:000324244:
# => S:SHROUD SPRING SCREW:000868265:
# => S:D REL VALVE ASSY:000771881:
# => S:YBELT,V:000323030:
# => S:YBELT,'V':000322933:

我对Text :: ParseWords的唯一问题是数据中的嵌套引号未正确转义。然而，这是构建错误的CSV数据，并且会导致大多数CSV解析器出现问题; - ）

所以你可能会注意到

# S,"YBELT,"V"",000323030,

出现（即引号下降到“V”附近）

# S:YBELT,V:000323030:

然而，如果它像这样逃脱

# S,"YBELT,\"V\"",000323030,

然后将保留引号

# S:YBELT,"V":000323030:

Answer 3

这就像魅力一样

假设

行以逗号分隔嵌入，

my @columns = Text::ParseWords::parse_line(',', 0, $line);

Answer 4

测试;工作： -

$_.=','; # fake an ending delimiter

while($_=~/"((?:""|[^"])*)",|([^,]*),/g) {
  $cell=defined($1) ? $1:$2; $cell=~s/""/"/g; 
  print "$cell\n";
}

# The regexp strategy is as follows:
# First - we attempt a match on any quoted part starting the CSV line:-
#  "((?:""|[^"])*)",
# It must start with a quote, and end with a quote followed by a comma, and is allowed to contain either doublequotes - "" - or anything except a sinlge quote [^"] - this goes into $1
# If we can't match that, we accept anything up to the next comma instead, & put it into $2
# Lastly, we convert "" to " and print out the cell.

警告CSV文件可以包含引号内嵌有换行符的单元格，因此如果一次读取数据，则需要执行此操作：

if("$pre$_"=~/,"[^,]*\z/) {
  $pre.=$_; next;
}
$_="$pre$_";

Answer 5

使用正则表达式查找匹配对是非平凡且通常无法解决的任务。 Jeffrey Friedl的Mastering regular expressions书中有很多例子。我现在没有它，但我记得他也使用了CSV作为一些例子。

Answer 6

您可以（尝试）使用CPAN.pm简单地让您的程序安装/更新Text :: CSV。如前所述，您甚至可以将其“安装”到家庭或本地目录，并将该目录添加到@INC（或者，如果您不想使用BEGIN块，则可以use lib 'dir'; - 它是可能更好）。

Answer 7

测试：


use Test::More tests => 2;

use strict;

sub splitCommaNotQuote {
    my ( $line ) = @_;

    my @fields = ();

    while ( $line =~ m/((\")([^\"]*)\"|[^,]*)(,|$)/g ) {
        if ( $2 ) {
            push( @fields, $3 );
        } else {
            push( @fields, $1 );
        }
        last if ( ! $4 );
    }

    return( @fields );
}

is_deeply(
    +[splitCommaNotQuote('S,"D" REL VALVE ASSY,000771881,')],
    +['S', '"D" REL VALVE ASSY', '000771881', ''],
    "Quote in value"
);
is_deeply(
    +[splitCommaNotQuote('S,"BELT V,FAN",000324244,')],
    +['S', 'BELT V,FAN', '000324244', ''],
    "Strip quotes from entire value"
);

如何使用正则表达式解析Perl中引用的CSV？

7 个答案:

请尝试使用CPAN

不使用CPAN确实是一种灾难。

测试;工作： -