如何隔离与CSV文件的不同列中的字​​母对应的单词?

时间:2012-04-20 02:30:12

标签: perl bash sed

我有一个CSV文件,如下所示:

ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

第一列中的每个字符对应于第二列中的单词,例如,在第一列中,A对应于“this”,C对应于“是”,{{1}带有“a”和D,带有句子。

给定变量B,可以设置为出现在第一列中的任何字符,我需要隔离与所选字母对应的单词,例如,如果我设置character ,那么上面的输出将是:

character="B"

如果我设置`character =“C”,那么上面的输出将是:

sentence
this
this another

如何仅输出与所选字母的位置对应的单词?

  • 该文件包含许多UTF-8字符。
  • 对于第1列中的每个字符,第2列中的字总数相同。
  • 第2列中的单词以空格分隔。

这是我到目前为止的代码:

is
another
is

3 个答案:

答案 0 :(得分:1)

这是一个主要做过的臀部回答。

既然SO不是“为我做我的工作”网站,你需要填写一些琐碎的空白。

sub get_index_of_char {
   my ($character, $charset) = @_;
   # Homework: read about index() function
   #http://perldoc.perl.org/functions/index.html
}

sub split_line {
    my ($line) = @_;
    # Separate the line into a charset (before comma), 
    # and whitespace separated word list.
    # You can use a regex for that
    my ($charset, @words) = ($line =~ /^([^,]+),(?(\S+)\s+)+(\S+)$/g); # Not tested
    return ($charset, \@words);
}

sub process_line {
    my ($line, $character) = @_;
    chomp($line);
    my ($charset, $words) = split_line($line);
    my $index = get_index_of_char($character, $charset);
    print $words->[$index] . "\n"; # Could contain a off-by-one bug
}

# Here be the main loop calling process_line() for every line from input

答案 1 :(得分:1)

这似乎可以解决问题。它使用DATA文件句柄从源文件中读取数据,而您必须从您自己的源中获取它。您可能还必须满足没有对应于给定字母的单词(此处第二条数据行中的“A”)。

use strict;
use warnings;

my @data;

while (<DATA>) {
  my ($keys, $words) = split /,/;
  my @keys = split //, $keys;
  my @words = split ' ', $words;
  my %index;
  push @{ $index{shift @keys} }, shift @words while @keys;
  push @data, \%index;
}

for my $character (qw/ B C /) {
  print "character = $character\n";
  print join(' ', @{$_->{$character}}), "\n" for @data;
  print "\n";
}

__DATA__
ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

<强>输出

character = B
sentence
this
this another

character = C
is
another
is

答案 2 :(得分:1)

这可能对您有用:

x=B                                                      # set wanted key variable
sed '
:a;s/^\([^,]\)\(.*,\)\([^ \n]*\) *\(.*\)/\2\4\n\1 \3/;ta # pair keys with values
s/,//                                                    # delete ,
s/\n[^'$x'] [^\n]*//g                                    # delete unwanted keys/values
s/\n.//g                                                 # delete wanted keys
s/ //                                                    # delete first space
/^$/d                                                    # delete empty lines
' file
sentence
this
this another

或在awk中:

awk -F, -vx=B '{i=split($1,a,"");split($2,b," ");c=s="";for(n=1;n<=i;n++)if(a[n]==x){c=c s b[n];s=" "} if(length(c))print c}' file
sentence
this
this another