Question

我需要打印具有重复字段的行，尝试使用botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the BatchWriteItem operation: The provided key element does not match the schema无效。
输入文件有两行：

sed

输出应仅是第二行，因为它具有完全重复的字符串（字段）。
但是它使用下面的命令打印两行

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

谢谢
RKP

Answer 1

添加仅包含1个循环的GENERIC解决方案。因此，这将查找完整行中是否有两个相同的字段（如果您不想对字段编号进行硬编码，则非常方便。）

awk '{delete a;for(i=1;i<=NF;i++){if(++a[$i]>1){print;next}}}'  Input_file

显示的示例输出如下。

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

说明： 现在为上述代码添加详细说明。

awk '                           ##Starting awk program here.
{                               ##Starting main BLOCK here.
  delete a
  for(i=1;i<=NF;i++){           ##Starting a for loop which runs from i=1 to till value of NF here, where NF is out of the box variable of awk.
    if(++a[$i]>1){              ##Checking condition if value of array a whose index is $1 is greater than 1 here, if yes then run following.
      print                     ##Printing current line now, as per OP if 2 fields are equal line should be printed.
      next                      ##Using next keyword for skipping all further statements and skipping basically for loop to save time if a match is found then NO need to run it further.
    }                           ##Closing BLOCK for if condition.
  }                             ##Closing BLOCK for fopr loop here.
}                               ##Closing main BLOCK here.
'   Input_file                  ##Mentioning Input_file name here.

Answer 2

输入：

$ cat input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1

命令：

awk '{for(i=1;i<=NF-1;i++)for(j=i+1;j<=NF;j++)if($i == $j){print; next}}' input

输出：

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

说明：

RavinderSingh13中的解决方案在复杂度方面更好，但是使用更多的内存，因为有必要将所有行值保存在关联数组中。

{
        for (i = 1; i <= NF - 1; i++) { #outer loop to from 1 to NF-1
                for (j = i + 1; j <= NF; j++) { #inner loop from i+1
                        if ($i == $j) { #value comparison of the two elements selected
                                print $0 #print
                                next    #jump to next line
                        }
                }
        }
}

Answer 3

如果有console.log(this.state)可用，则使用grep，或者使用-P

perl

$ cat ip.txt s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1 s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0 2.5 42 32.5 abc 3.14 3.14 123 part cop par $ grep -P '(?<!\S)(\S++).*(?<!\S)\1(?!\S)' ip.txt s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0 3.14 3.14 123 $ perl -ne 'print if /(?<!\S)(\S++).*(?<!\S)\1(?!\S)/' ip.txt s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0 3.14 3.14 123断言没有非空白字符
(?<!\S)捕获所有非空白字符，所有格修饰符可确保部分字段不匹配
(\S++)之间的任意数量的字符
.*匹配整个字段，礼貌地针对非空格字符进行环视断言

Answer 4

使用Perl-正则表达式和反向引用

perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' file

感谢@Sundeep发现细微的问题，并感谢@zdim帮助解决问题

具有以下输入

$ cat  input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1
2.5 42 32.5 abc
part cop par
spar cop par

$ perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

$

另一种使用哈希/向后看的方法

$ perl -lane ' %k=/(\S+)(?<=(.))/g ; print if scalar(@F) != scalar(keys %k) ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

$

Answer 5

从您的问题中我可以看出，您所需要的就是：

$ awk '$1==$3' file
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

如果这还不是您所需要的，请更新您的问题以提供更真实的示例输入/输出。

Answer 6

[@ BenjaminW。正确地观察到我对这个问题略有误读。我的答案留在下面供参考，但我将其撤回作为该问题的候选答案。]

这就是您想要的：

sort input_file | uniq -d

sort命令对输入文件的内容进行排序，以便在排序后，相同的行彼此相邻出现。 uniq命令通常会折叠重复的行，但是使用-d选项调用时，只会打印重复的行。

当然，只有在不需要使用sed的情况下，我的解决方案才可以接受。

Answer 7

这可能对您有用（GNU sed）：

SELECT
    tb1.DOC_NO,
    CAST(SUM(tb2.QTY) AS FLOAT) AS QTY_TOTAL,
    ROUND(CAST(SUM(tb2.QTY * tb2.PRICE) AS FLOAT), 2) AS PRICE_TOTAL,
    tb1.DATE,
    tb1.STATUS_A,
    tb2.STATUS_B 
FROM
    tb1
INNER JOIN 
    tb2 ON tb1.DOC_NO = tb2.DOC_NO
WHERE
    tb1.STATUS_B = '0'
GROUP BY
    tb1.DOC_NO, tb1.DATE,
    tb1.STATUS_A, tb1.STATUS_B 
ORDER BY
    COH.DOC_NO_REQ_TO_ULI DESC

在保留空间中复制当前行。

在非空格字符串的两边用换行符替换任何空格。

如果没有重复，请删除掺假行。

否则，用保留空间中原始行的副本替换模式空间并打印。

Answer 8

您可以使用awk来做到这一点：

awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j){print;next}}' input_file

不限于3列，无论重复发生在何处。

如果要反向显示，则打印没有重复的行：

awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j)next; print}'

如何打印重复字段的行？

8 个答案: