Question

我的文本文件包含以下行：

this is the code ;rfc1234;rfc1234
this is the code ;rfc1234;rfc1234;rfc1234;rfc1234

如何将文件中的重复单词压缩为单个单词，如下所示：

this is the code ;rfc1234
this is the code ;rfc1234

我试过'tr'命令，但它仅限于挤压字符

Answer 1

以sed为前缀为;

的任意重复字符串

$ sed -E 's/(;[^;]+)(\1)+/\1/g' file

或者，如果你想删除第一个令牌之后的所有内容而不检查它们是否与前一个令牌匹配

$ sed -E 's/(\S);.*/\1/' file

<强>解释

(;[^;]+)用于捕获以分号开头的字符串 (\1)+后跟相同的捕获字符串一次或多次 /\1/g用一个实例替换整个链，并重复

Answer 2

关注awk可能会有所帮助。它将查找Input_file的最后一列中的所有项目，并且只保留其中的唯一值。

awk '{num=split($NF,array,";");for(i=1;i<=num;i++){if(!array1[array[i]]++){val=val?val ";" array[i]:array[i]}};NF--;print $0";"val;val="";delete array;delete array1}'   Input_file

现在也添加非单线形式的解决方案。

awk '
{
  num=split($NF,array,";");
  for(i=1;i<=num;i++){
    if(!array1[array[i]]++){
      val=val?val ";" array[i]:array[i]}
};
  NF--;
  print $0";"val;
  val="";
  delete array;
  delete array1
}'   Input_file

<强> 说明：

awk '
{
  num=split($NF,array,";");             ##Creating a variable named num whose value is length of array named array, which is created on last field of line with ; as a delimiter.
  for(i=1;i<=num;i++){                  ##Starting a for loop from i=1 to till value of num each time increment i as 1.
    if(!array1[array[i]]++){            ##Chrcking here a condition if array named array1 index is value of array[i] is NOT coming more than 1 value then do following.
      val=val?val ";" array[i]:array[i]}##Creating a variable named val here whose value is array[i] value and keep concatenating its own value of it.
};
  NF--;                                 ##Reducing the value of NF(number of fields) in current line to remove the last field from it.
  print $0";"val;                       ##Printing the current line(without last field) ; and then value of val here.
  val="";                               ##Nullifying variable val here.
  delete array;                         ##Deleting array named array here.
  delete array1                         ##Deleting array named array1 here.
}'  Input_file                          ##Mentioning Input_file name here.

Answer 3

我开始玩s/(.+)\1/\1/g。它似乎与perl一起工作（甚至找到了is_is_），但并没有把我带到那里：

$ perl -pe 's/(.+)\1+/\1/g' file
this the code ;rfc1234
this the code ;rfc1234;rfc1234

Answer 4

您可以使用以下命令来实现此目的： -

 echo "this is the code ;rfc1234;rfc1234" | sed 's/rfc1234//2g'

 echo "this is the code ;rfc1234;rfc1234;rfc1234;rfc1234" | sed 's/rfc1234//2g'

或

  sed 's/rfc1234//2g' yourfile.txt

Answer 5

sed 's/\(;[^;]*\).*/\1/'  file

Answer 6

这可能适合你（GNU sed）：

sed -r ':a;s/(\S+)\1+/\1/g;ta' file

重复正则表达式，直到只剩下第一个模式。

sed挤压多次出现的单词

6 个答案: