如果下面有一个输入文件,Linux中是否有任何命令/方式将其转换为我想要的文件,如下所示?
输入文件:
Column_1 Column_2
scaffold_A SNP_marker1
scaffold_A SNP_marker2
scaffold_A SNP_marker3
scaffold_A SNP_marker4
scaffold_B SNP_marker5
scaffold_B SNP_marker6
scaffold_B SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9
scaffold_A SNP_marker10
所需的输出文件:
Column_1 Column_2
scaffold_A SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4
scaffold_B SNP_marker5;SNP_marker6;SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9;SNP_marker10
我在考虑使用grep,uniq等,但仍然无法弄清楚如何完成这项工作。
答案 0 :(得分:2)
Perl解决方案:
perl -lane 'sub output {
print "$last\t", join ";", @buff;
}
$last //= $F[0];
if ($F[0] ne $last) {
output();
undef @buff;
$last = $F[0];
}
push @buff, $F[1];
}{ output();'
答案 1 :(得分:2)
python解决方案(假设文件名在命令行传入)
from __future__ import print_function #not needed with Python3
with open('infile') as infile, open('outfile', 'w') as outfile:
outfile.write(infile.readline()) # transfer the header
col_one, col_two = infile.readline().split()
col_two = [col_two] # make it a list
for line in infile:
data = line.split()
if col_one != data[0]:
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
col_one = data[0]
col_two = [data[1]]
else:
col_two.append(data[1])
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
答案 2 :(得分:0)
#!/bin/bash
awk '
BEGIN{
str = ""
}
{
if ( str != $1 ) {
if ( NR != 1 ){
printf("\n")
}
str = $1
printf("%s\t%s",$1,$2)
} else if ( str == $1 ) {
printf(";%s",$2)
}
}
END{
printf("\n")
}' your_file.txt
答案 3 :(得分:0)
您也可以在bash中尝试以下解决方案:
cat input.txt | while read L; do y=`echo $L | cut -f1 -d' '`; { test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; } || { x="$y";echo -en "\n$L"; }; done
或以人类更易阅读的形式进行审核:
cat input.txt | while read L;
do
y=`echo $L | cut -f1 -d' '`;
{
test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`";
} ||
{
x="$y";echo -en "\n$L";
};
done
请注意,脚本执行结果中的良好格式化输出基于bash echo
命令。
答案 4 :(得分:0)
如果你不介意使用Python,它有itertools.groupby
,它就是为了这个目的:
# file: comebine.py
import itertools
with open('data.txt') as f:
data = [row.split() for row in f]
for column1, rows_group in itertools.groupby(data, key=lambda row: row[0]):
print column1, ';'.join(column2 for column1, column2 in rows_group)
将此脚本另存为 combine.py 。假设您的输入文件位于 data.txt 中,运行它以获得所需的输出:
python combine.py
with open(...)
块的结果是data
,一个行列表,每行本身就是一列列。itertools.groupby
函数接受一个可迭代的,在本例中是一个列表。您告诉它如何使用键(即column1)将行组合在一起。