如何使用ubuntu命令行在文件夹中重新找到重复的行?

时间:2018-05-16 00:53:00

标签: bash

我正在尝试在文件夹中重新找到重复的行(在彼此之后)并且它是子文件夹。使用ubuntu命令行。

我使用的命令如下:

sudo grep -liRZP --color '(src\=\"https\:\/\/).*(\/wp-content\/.*\.png\")' --exclude=\*{.sql,_log,.log,backup*,.*backup*,Backup*,.*Backup*,BACKUP*,.*BACKUP*,.png} . | xargs -0 sed -i[backup] -e 's_\(src\=\"\)https\:\/\/.*\(\/wp-content\/.*\.png\"\)_\1\2_gp'

在最近的过去,但在一开始我一定是犯了错误。因此我现在发现我有很多文件显示重复的行。例如,他们看起来像这样:

<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.get_permalink(2320).'"><span><img src="/wp-content/uploads/2017/06/info.png" class="menu-icon"></span>Information & FAQ</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.get_permalink(2320).'"><span><img src="/wp-content/uploads/2017/06/info.png" class="menu-icon"></span>Information & FAQ</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "overview")).'"><span><img src="/wp-content/uploads/2017/06/preview.png" class="menu-icon"></span>Overview<$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "overview")).'"><span><img src="/wp-content/uploads/2017/06/preview.png" class="menu-icon"></span>Overview<$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "sales")).'"><span><img src="/wp-content/uploads/2017/06/sale.png" class="menu-icon"></span>Sales</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "sales")).'"><span><img src="/wp-content/uploads/2017/06/sale.png" class="menu-icon"></span>Sales</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "impressions")).'"><span><img src="/wp-content/uploads/2017/06/impression.png" class="menu-icon"></span>Imp$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "impressions")).'"><span><img src="/wp-content/uploads/2017/06/impression.png" class="menu-icon"></span>Imp$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "payments")).'"><span><img src="/wp-content/uploads/2017/06/payment-history.png" class="menu-icon"></span>P$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "payments")).'"><span><img src="/wp-content/uploads/2017/06/payment-history.png" class="menu-icon"></span>P$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "creatives")).'"><span><img src="/wp-content/uploads/2017/06/promotion.png" class="menu-icon"></span>Promot$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "creatives")).'"><span><img src="/wp-content/uploads/2017/06/promotion.png" class="menu-icon"></span>Promot$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "profile")).'"><span><img src="/wp-content/uploads/2017/06/edit-pro.png" class="menu-icon"></span>Edit Prof$
<li class="menu-item menu-item-type-post_type menu-item-object-page"><a href="'.getLinkab(array("sub" => "profile")).'"><span><img src="/wp-content/uploads/2017/06/edit-pro.png" class="menu-icon"></span>Edit Prof$

在第一步中,我想查看受影响的行列表。在第二步中,我想删除重复的行。

我试过用:

sudo grep -iRE --color '\(^.*$\)\1' .

但是会引发错误:

grep: Invalid back reference

我也尝试过:

sudo grep -iRP --color '\(^.*$\)\1'

错误:

grep: reference to non-existent subpattern

任何人都可以帮助我吗?如何最好地删除重复的行?

1 个答案:

答案 0 :(得分:0)

快速而脏的文件识别:

find . -type f | parallel --tag 'diff {} <(uniq {})'

快速而肮脏的替代品:

find . -type f | parallel 'cp {} {}.old; cat {}.old | uniq > {}'

只看行&gt; 10个字符并包含字符串&#39; a href&#39;:

myuniq() {
  perl -ne 'if($last eq $_ and /a href/ and length($_) > 10) {
    # Dont print
    1;
  } else {
    print;
  }
  $last=$_;' "$@"
}
export -f myuniq

find . -type f | parallel --tag 'diff {} <(myuniq {})'
# cp the file first to conserve permissions
find . -type f | parallel 'cp -a {} {}.old; cat {}.old | myuniq > {}'