计算单词数并指定行数

时间:2019-07-29 15:28:21

标签: linux bash

需要使用另一个文件中遇到的单词数(例如Word2Word1: 35 [25, 50, 300, ...] Word2: 15 [10, 25, 65, ...] )制作一个文件,并指定这些单词以这种格式出现的行:

store_gdt(dtr)

1 个答案:

答案 0 :(得分:0)

不幸的是,您的问题缺少示例输入文件,该文件演示了您需要处理的所有事情以及基于这些文件的预期输出,因此,我只是在补充一些内容。

提供文件

wordlist.txt

cat
dog
fish
horse

input.txt

There are three fish.
Two red fish.
One blue fish and a brown dog.
There are no matching words on this line.
Also there is no cat, only the dog. Oh, there is a white dog too.
There are doggies.

此perl脚本将打印匹配的单词及其行,包括每行一个单词的多个匹配项:

#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use English;

my %words;

open my $wordlist, "<", $ARGV[0];
while (<$wordlist>) {
    chomp;
    $words{$_} = [];
}

open my $text, "<", $ARGV[1];
while (<$text>) {
    while (my ($word, $positions) = each %words) {
        while (m/\b\Q$word\E\b/g) { # Match all occurrences of the word by itself
            push @$positions, $NR;
        }
    }
}

$OFS = ' ';
for my $word (sort keys %words) {
    my $positions = $words{$word};
    say "$word:", scalar(@$positions), join(',', @$positions);
}

示例:

$ perl words.pl wordlist.txt input.txt
cat: 1 5
dog: 3 3,5,5
fish: 3 1,2,3
horse: 0