说明：

Question

我有一个非常大的文本文件。该文件包含单词和为单词提供的许多定义。有60个单词重复17次。单词始终位于第一个字段中，并且以下字段中的定义与单词相邻。

示例：

hand;extremity of the body;that which is commonly used to write with

paper;thin sheet made of wood pulp;material used to write things on;some other def's

book;collection of pages on a topic;publication of knowledge;concatenated paper with text

ham;that which comes from pork;a tasty meat;a type of food

anotherword;defs;defs;defs;defs

它一直持续到第60个单词然后重新开始，使用相同的60个单词和不同的定义。订单总是不一样，所以接下来的60可能是

book;defs;defs;defs

television;defs;defs;defs

ham;defs;defs;defs;defs;defs

paper;defs;defs

此文件的字段分隔符为＆＃34;;＆＃34;并且每个记录之间都有一条空记录，如示例所示。

我想要做的是查看第一个字段并输出具有相同第一个字段的记录。示例：

ham;defs;defs;defs;defs;defs
ham;defs;defs;defs
ham;defs;defs;defs;defs
ham;defs;defs;
ham;defs;defs;defs
ham;defs;
ham;defs;defs
ham;defs;defs;defs;defs
paper;defs;defs;defs;defs
paper;defs;defs;defs
paper;defs;defs;

等等。

如果不清楚，我道歉。请帮忙！

Answer 1

简单的grep和sort命令可以为你...尝试如下....

说明：

# ^$ will search for blank lines and -v will reverse that search ... so you get all lines which has data
# passing that data to sort command will sort it... 
# -t option of sort for delimiter and -k option take which column it need to sort

grep -v ^$ yourfile.txt | sort -t";' -k1

# And if you expect duplicate lines also, meaning same lines multiple time but need it only 1 time... then pipe to the uniq command as below

grep -v ^$ yourfile.txt | sort -t";" -k1 | uniq

对于您的样本数据，我得到如下输出....

$ grep -v ^$ mysamplefile.txt | sort -t";" -k1 | uniq
anotherword;defs;defs;defs;defs
book;collection of pages on a topic;publication of knowledge;concatenated paper with text
book;defs;defs;defs
ham;defs;defs;defs;defs;defs
ham;that which comes from pork;a tasty meat;a type of food
hand;extremity of the body;that which is commonly used to write with
paper;defs;defs
paper;thin sheet made of wood pulp;material used to write things on;some other def's
television;defs;defs;defs

搜索第一个字段并获取第一个字段相同的所有记录的输出。

1 个答案:

说明：