另一种方法:cut -d <string>?</string>

时间:2014-04-01 12:00:41

标签: bash shell parsing text cut

当我输入ls时,我得到:

aedes_aegypti_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_albimanus_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_arabiensis_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_stephensi_upstream_dremeready_all_simpleMasked_random.fasta
culex_quinquefasciatus_upstream_dremeready_all_simpleMasked_random.fasta

我想把它管道化(或通过一些替代方式),以便我得到:

aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus

如果cut会接受一个字符串(多个字符)作为它的分隔符,那么我可以使用:

cut -d "_upstream_" -f1

但是不允许这样做,因为剪切仅将单个字符作为分隔符。

4 个答案:

答案 0 :(得分:4)

awk允许字符串作为分隔符:

$ awk -F"_upstream_" '{print $1}' file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

请注意,对于给定的输入,您还可以使用cut_作为分隔符并打印前两个记录:

$ cut -d'_' -f-2 file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

sedgrep也可以成功。例如,此grep使用预测来打印从行首开始的所有内容,直到找到_upstream

$ grep -Po '^\w*(?=_upstream)' file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

答案 1 :(得分:3)

如果你只想要第一个字段,你可以用纯粹的bash做到这一点:

ls | while read line; do echo "${line%%_upstream_*}"; done

答案 2 :(得分:3)

你也可以使用sed:

sed -i.bak 's/_upstream.*//' file

结果:

aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

注意:这也会创建原始文件的备份为file.bak。

答案 3 :(得分:3)

与@Tom Fenech相似 - 使用bash parameter expansion/substring removal - 但使用for循环:

$ ls
aedes_aegypti_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_albimanus_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_arabiensis_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_stephensi_upstream_dremeready_all_simpleMasked_random.fasta
culex_quinquefasciatus_upstream_dremeready_all_simpleMasked_random.fasta
drosophila_melanogaster_upstream_dremeready_all_simpleMasked_random.fasta

$ for file in *; do
> echo "${file%%_upstream_*}"
> done
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster