Question

顺便说一句，我是Perl的新手。我有一个Perl脚本需要计算字符串在文件中出现的次数。该脚本从文件本身获取单词。

我需要它来获取文件中的第一个单词，然后搜索文件的其余部分以查看它是否在其他任何地方重复。如果重复，我需要它返回它重复的次数。如果没有重复，它可以返回0.我需要它然后获取文件中的下一个单词并再次检查。

我将从文件中获取第一个单词，在文件中搜索该单词的重复，从中获取第二个单词在文件中，搜索文件以重复该单词，从文件中获取第三个单词，在文件中搜索该单词的重复。

到目前为止，我有一个while循环，它抓住我需要的每个单词，但我不知道如何在不重置当前行的位置的情况下搜索重复。那我该怎么做？任何想法或建议都非常感谢！提前谢谢！

while (<theFile>) {
    my $line1 = $_;
    my $startHere = rindex($line1, ",");
    my $theName = substr($line1, $startHere + 1, length($line1) - $startHere);
    #print "the name: ".$theName."\n";
}

Answer 1

使用哈希表;

my %wordcount = ();

while(my $line = <theFile>)
{
    chomp($line);
    my @words = split(' ', $line);
    foreach my $word(@words)
    {
        $wordCount{$word} += 1;
    }
}

# output
foreach my $key(keys %wordCount)
{
    print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}

输出中的$wordCount{$key} - 1第一次看到了一个单词;只在文件中存档一次的单词将计为0

除非这实际上是家庭作业和/或你必须在你描述的特定庄园中取得成果，否则这将更加高效。

修改：从下面的评论中

我正在搜索的每个单词都不是“第一个单词”，它是一行中的某个单词。基本上我有一个csv文件，我跳到第三个值并搜索它的重复。

我仍然会使用这种方法。你想要做的是：

拆分为,，因为这是一个CSV文件
在每行中拉出数组中的第3个单词，并将您感兴趣的单词存储在自己的哈希表中
最后，遍历“搜索词”哈希表，并从wordcount表中提取计数

所以：

my @words = split(',', $line);
$searchTable{@words[2]} = 1;

...

foreach my $key(keys %searchTable)
{
    print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}

你必须根据你在第三栏中重复计算单词的规则进行调整。您可以在插入wordCount哈希的循环之前从@words中删除它们。

Answer 2

my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
foreach my $line (<theFile>) {
    $line =~ s/$word/$wordcount++/eg;
}
print $wordcount."\n";

查看正则表达式标志'e'，了解更多信息。我没有测试代码，但类似的东西应该工作。为了澄清，“e”标志在替换之前将正则表达式的第二部分（替换）评估为代码，但它不止于此，因此使用该标志，您应该能够使其工作。

既然我明白了你的要求，上述解决方案将无效。您可以做的是，使用sysread将整个文件读入缓冲区，然后运行相同的替换，但是您必须手动关闭第一个字，或者您可以在事后减少。这是因为sysread文件句柄和常规文件句柄的处理方式不同，所以试试这个：

my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
my $srline = '';
#some arbitrary very long length, longer than file
#Looping also possible.
sysread(theFile,$srline,10000000) 
$srline =~ s/$word/$wordcount++/eg;
$wordcount--; # I think that the first word will still be in here, causing issues, you should test.
print $wordcount."\n";

现在，鉴于我在回答您的问题后阅读了您的评论，我认为您当前的算法并不是最优的，并且您可能希望存储一个存储文件中单词的所有计数的哈希。这可能最好使用以下内容完成：

my %counts = ();
foreach my $line (<theFile>) {
    $line =~ s/(\w+)/$counts{$1}++/eg;
}
# now %counts contains key-value pair words for everything in the file.

Answer 3

要查找文件中存在的所有单词的计数，您可以执行以下操作：

#!/usr/bin/perl
use strict;
use warnings;

my %count_of;
while (my $line = <>) { #read from file or STDIN
  foreach my $word (split /\s+/, $line) {
     $count_of{$word}++;
  }
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
  print "'$word': $count_of{$word}\n";
}
__END__

count在文件perl中重复的字符串的次数

3 个答案: