我是linux和命令行的新手。我试图找到一个命令,这将允许我用除了第一个之外的所有字段的分号替换white space
(在.csv文本文件中)。请看下面的例子,感谢任何帮助,我花了很长时间寻找解决方案。如果你确实有答案,请解释一下这个命令,这样我就可以尝试学习如何以及为什么。非常感谢。
输入文字示例:
0 k__Bacteria p__Firmicutes c__Bacilli
1 k__Bacteria p__Firmicutes c__Clostridia
2 k__Bacteria p__Bacteroidetes c__Bacteroidia
3 k__Bacteria p__Bacteroidetes c__Bacteroidia
我需要的是:
0 k__Bacteria;p__Firmicutes;c__Bacilli
1 k__Bacteria;p__Firmicutes;c__Clostridia
2 k__Bacteria;p__Bacteroidetes;c__Bacteroidia
3 k__Bacteria;p__Bacteroidetes;c__Bacteroidia
答案 0 :(得分:1)
$ cat file
0 k__Bacteria p__Firmicutes c__Bacilli foo bar
1 k__Bacteria p__Firmicutes c__Clostridia the quick brown
2 k__Bacteria p__Bacteroidetes c__Bacteroidia fox jumped over
3 k__Bacteria p__Bacteroidetes c__Bacteroidia the lazy dogs back
$ awk -v skip=1 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria;p__Firmicutes;c__Bacilli;foo;bar
1 k__Bacteria;p__Firmicutes;c__Clostridia;the;quick;brown
2 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back
$ awk -v skip=2 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria p__Firmicutes;c__Bacilli;foo;bar
1 k__Bacteria p__Firmicutes;c__Clostridia;the;quick;brown
2 k__Bacteria p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3 k__Bacteria p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back
$ awk -v skip=3 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria p__Firmicutes c__Bacilli;foo;bar
1 k__Bacteria p__Firmicutes c__Clostridia;the;quick;brown
2 k__Bacteria p__Bacteroidetes c__Bacteroidia;fox;jumped;over
3 k__Bacteria p__Bacteroidetes c__Bacteroidia;the;lazy;dogs;back
答案 1 :(得分:0)
你可以在python中这样做:
#!/usr/bin/env python
import sys
if __name__ == '__main__':
for line in sys.stdin:
cols = line.split()
print ' '.join([cols[0], ';'.join(cols[1:])])
只需chmod +x script
文件并执行./script < input
。
请注意,line.split()将按多个空格分割,'a b\tc'
将在['a', 'b', 'c']
中生成。
答案 2 :(得分:0)
这是解决方案awk
。它可能是脏的,有人可以改进它,但它的工作
awk 'OFS=";"{a=$1;$1="";$0=a";"$0}sub(/;;/," ",$0) ' temp.txt
输出
0 k_Bacteria;p_Firmicutes;c_Bacilli
1 k_Bacteria;p_Firmicutes;c_Clostridia
2 k_Bacteria;p_Bacteroidetes;c_Bacteroidia
3 k_Bacteria;p_Bacteroidetes;c_Bacteroidia
cat temp.txt
0 k_Bacteria p_Firmicutes c_Bacilli
1 k_Bacteria p_Firmicutes c_Clostridia
2 k_Bacteria p_Bacteroidetes c_Bacteroidia
3 k_Bacteria p_Bacteroidetes c_Bacteroidia
根据评论编辑:更新
试试这个awk脚本myawk.sh
BEGIN { print "Begin Processing "}
OFS=";"{
$9=$9"%%"
b = $0;
split($0,a,"%%");
gsub(/;/," ",a[1])
print a[1]a[2]
}
END {print "Process Complete"}
执行awk -f myawk.sh temp.txt
,其中$ 9是你要保留空格的变量uptill
答案 3 :(得分:0)
awk -v OFS=";" '{$1=$1" "$2;$2="";gsub(/;;/,";",$0);print}' your_file
或者可能是perl:
perl -F -lane 'print join ";",@F' your_file| perl -pe 's/;/ /'