Question

我有一个包含多列的文件。我正在尝试过滤掉有记录的记录前两个字段中的值相同。这两个字段都包含文本值。这个是我正在使用的命令：

cat input_file | awk -F'\t' '{if($1==$2) print $1 $2}'

当我运行此命令时，我只获得字段中值的行数字。该文件包含多个行，这两行在两者中具有相同的值不是数字的字段。如何强制awk进行字符串比较？

此外，还有其他方法可以实现这一目标吗？（我是Unix环境的新手并且不知道太多的伎俩...会很感激建议）

Answer 1

如果要过滤掉前两列相同的所有行，只需执行awk '$1!=$2' file，awk使用空格作为默认字段分隔符，默认操作为print。

$ cat file
1       1        col3   line1
two     two      col3   line2
three   3        col3   line3           
four4   four4    col3   line4

$ awk '$1!=$2' file
three   3        col3   line3           

$ awk '$1==$2' file
1       1        col3   line1
two     two      col3   line2
four4   four4    col3   line4

字段类型无关紧要，无需使用cat。

Answer 2

您实际上是正确的，除非您添加了-F'\t'，这会为您带来问题。在awk中字段分隔符FS的默认值是包含单个空格“”的字符串。

因此您需要删除-F'\t'。

例如见下文：

> cat temp
1       1 random text
some some random text
some more random text


> nawk '{if($1==$2){print}}' temp
1       1 random text
some some random text

> nawk -F'\t' '{if($1==$2){print}}' temp
>

我还不确定为什么第二个命令不起作用。但是，你需要删除-F

Answer 3

我以sudo_O

为例

[sgeorge@sgeorge-ld ~]$ cat s
1       1        col3   line1
two     two      col3   line2
three   3        col3   line3           
four4   four4    col3   line4
[sgeorge@sgeorge-ld ~]$ cat s | perl -lane '$F[0] == $F[1] && print'
1       1        col3   line1
two     two      col3   line2
four4   four4    col3   line4

基于字符串比较进行过滤

3 个答案: