对于我文件中的每一行,我想在第4个破折号之前打印该行的所有内容。
输入:
TCGA-HC-8216-10A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
我希望在第四个破折号“ - ”
上拆分每一行输出:
TCGA-HC-8216-10A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
我知道我可以像这样分开每一个破折号:
#!/usr/bin/env bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo $IN | tr "-" "\n")
for x in $arr
do
echo "> [$x]"
done
但是这会在每个短划线之间分割并打印字符串的每个部分。
答案 0 :(得分:4)
使用cut
cut -d- -f1-4 <<'EOF'
TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
EOF
您正在-d
的{{1}}(分隔符)上切换输入,并返回-
(字段)-f
,一到四。
答案 1 :(得分:1)
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | cut -d '-' -f1-4)
echo "$arr"
打印:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
答案 2 :(得分:0)
将grep与ERE一起使用:
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
使用BRE:
arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*")
示例:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
for x in $arr
do
echo "> [$x]"
done
输出:
> [TCGA-HC-8216-01A]
> [TCGA-J4-8200-10A]
> [TCGA-EJ-A65E-10A]
答案 3 :(得分:0)
使用纯粹的bash和模式匹配:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
re='([^-]+-){3}[^-]+'
for line in $IN
do
if [[ $line =~ $re ]]; then
trunc=${BASH_REMATCH[0]}
fi
echo "$trunc"
done
输出:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A