perl regex从文件中缩写列

时间:2014-12-15 15:31:03

标签: regex perl

我有一个包含标题和值的多行文件。 因为这些值将被插入到数据库中,所以我想使用标题来表示列名。所以示例数据如下。

Sales-Date
Item
Sale Price
Discount
Cost of Item
Profit (loss)

我已将列仅放入数组中,并删除了括号和短划线。结果如下:

Sales Date
Item
Sale Price
Discount
Cost of Item
Profit loss

所以我需要做的是提出一个正在查看该行的正则表达式,如果只有一个单词,则返回说出前4个字母,如果是多个单词,则返回每个单词的第一个字母。理想的大写。所以期望的数据看起来像:

SD
ITEM
SP
DISC
COI
PL

我没有太多运气。谢谢。

3 个答案:

答案 0 :(得分:2)

这样的事情,也许是:

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

while (<DATA>) {
  chomp;

  # If the line contains whitespace...
  if (/\s/) {
    # ... split the line into words ...
    # ... take the first letter of each word ...
    # ... join the letters together ...
    # ... and upper-case the resulting string.
    say uc join '', map { substr $_, 0, 1 } split /\s+/;
  } else {
    # ... otherwise, take the first four characters from the string ...
    # ... and upper-case them.
    say uc substr $_, 0, 4;
  }
}

__END__
Sales Date
Item
Sale Price
Discount
Cost of Item
Profit loss

答案 1 :(得分:1)

一种可能的解决方案是通过空格将线条分割成数组,而不是仅捕获每个单词的每个字母。类似的东西:

my $line = "Sales Date";

# Split line into an array separated by whitespace
my @words = split /\s+/, $line;

my $letters;
# For loop through number of words in array
for (@words) {
    m/(.)/;
    $letters .= $1;
}

print $letters;

以上将打印SD。您只需更改m /(.)/即可表示要捕获的字符数。

答案 2 :(得分:1)

my @arr = map {
  # make entire string upper case
  local $_ = uc;
  # remove trailing white-spaces (sometimes chomp fails on line endings)
  s/\s+\z//;

  # more words?
  /\s/
      # take first letter of every word
      ? join("", /\b(\w)/g)
      # take first 1 to 4 letters (and be greedy at that)
      : /(\w{1,4})/;
}
<DATA>;

print $_, "\n" for @arr;

__DATA__
Sales Date
Item
Sale Price
Discount
Cost of Item
Profit loss

输出

SD
ITEM
SP
DISC
COI
PL