Question

我有一个名为input.txt的输入文件，如下所示：

powerOf|creating new file|failure
creatEd|new file creating|failure
powerAp|powerof server|failureof file

我将文本提取到第一个字段中的第一个大写字母之前，并将这些摘要存储在output.txt中：

power
creat

我使用了sed命令来分离出值，并且运行良好。

从输出文件（output.txt）中，我需要从第一个字段开始grep，输出应如下所示：

Power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

我尝试了几种方法，但没有得到预期的输出。

我尝试了以下操作，但我得到了重复的条目：

cat input.txt | cut -d '|' f1 >> input1.txt
cat input1.txt | s/\([a-z]\)\([A-Z]\)/\1 \2/g >> output.txt
while read -r line;do
  echo $ line
  cat input.txt |cut -d ‘|’ f1|grep $line >> output1. txt
done< "output.txt"

我在输入文件中有20000行。我不知道为什么我得到重复的输出。我在做什么错了？

Answer 1

重击解决方案：

#!/bin/bash
keys=()
declare -A map
while read line; do
    key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//')
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${line}"
    else
        map[$key]+=",${line}"
    fi
done

for key in ${keys[*]}; do
    echo "${key}"
    echo "${key}:${map[$key]}"
done

exit 0

也许Perl解决方案也适用于OP：

#!/usr/bin/perl
use strict;
use warnings;

my @keys;
my %map;
while (<>) {
    chomp;
    my($key) = /^([[:lower:]]+)/;
    if (not exists $map{$key}) {
        push(@keys, $key);
        $map{$key} = [];
    }
    push(@{ $map{$key} }, $_);
}

foreach my $key (@keys) {
    print "$key\n";
    print "$key:", join(",", @{ $map{$key} }), "\n";
}


exit 0;

使用给定的输入进行测试：

$ perl dummy.pl <dummy.txt
power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

OP重新陈述原始问题后，

更新。仅包含输入的第二列而不是整行的第一个循环的解决方案：

    message=$(echo ${line} | cut -d \| -f2)
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${message}"
    else
        map[$key]+=",${message}"
    fi

使用给定的输入进行测试：

$ perl dummy.pl <dummy.txt
power
power:creating new file,powerof server
creat
creat:new file creating

Answer 2

基于useless uses of cat和其他反模式，您基本上正在做

# XXX not a solution, just a refactoring of your code
sed 's/\([a-z]\)\([A-Z]\).*/\1/' input.txt | grep -f - input.txt

可以很好地提取这些行，但不执行任何操作来加入它们。如果要合并具有相同前缀值的行，则简单的Awk脚本可能会满足您的需要。

awk '{ key=$1; sub(/[A-Z].*/, "", key)
      b[key] = (key in b ? b[key] "," : key ":" ) $0 }
    END { for(k in b) print b[k] }' input.txt

我们将前缀提取到key中。如果它是我们之前见过的键（在这种情况下它已经存在于关联数组b中），请附加先前的值和逗号，否则将数组值初始化为键本身，并在当前行之前添加一个冒号。完成后，遍历累积的密钥并打印我们为每个密钥存储的值。

如果行很长，则可能无法一次将20,000行存储到内存中，但是如果您的示例具有代表性，那么即使是适度的硬件也应该是一项不起眼的任务。

需要shell脚本的帮助以获得预期的输出

2 个答案: