鉴于以下文件,我想计算每列不相同的每种模式的出现频率,即:
A/A C/G C/G
A/T C/C G/G
A/A C/G C/C
A/T C/G C/G
T/T C/G C/G
输出:
A/T = 2/5
C/G = 4/5
C/G = 3/5
我在AWK中尝试了一些代码,但似乎没有用。我很感激它,谢谢!
编辑:
我重新创建了我的文件如下:
A A C G C G
A T C C G G
A A C G C C
A T C G C G
T T C G C G
awk '$1 != $2 {n++}; END {print n}' file
这给了我前两列的出现次数。我现在想循环遍历列,并检查每两列是否相等,即1是2,3是4等等。
我怎样才能在奇数列上实现循环?
答案 0 :(得分:1)
我会这样做:
from collections import Counter
with open('file.txt', 'r') as raw_data:
data = [line.strip().split() for line in raw_data.readlines()]
a = [record[0] for record in data]
b = [record[1] for record in data]
c = [record[2] for record in data]
print Counter(a)
print Counter(b)
print Counter(c)
它将数据打印为字典,但您可以从现在开始处理它,对吗?
答案 1 :(得分:0)
这可能会有所帮助。然而,也许,有更好的方法来做到这一点:
queryWords = Arrays.stream(queryWords).map(s -> "%"+s+"%").toArray(String[]::new);
输出:
text = """A/A C/G C/G
A/T C/C G/G
A/A C/G C/C
A/T C/G C/G
T/T C/G C/G"""
first_column = list()
second_column = list()
third_column = list()
for row in text.strip().split('\n'):
columns = row.split()
first_column.append(columns[0])
second_column.append(columns[1])
third_column.append(columns[2])
first_column_ocurrences = dict((i, "{}/{}".format(first_column.count(i), len(first_column))) for i in first_column)
second_column_ocurrences = dict((i, "{}/{}".format(second_column.count(i), len(second_column))) for i in second_column)
third_column_ocurrences = dict((i, "{}/{}".format(third_column.count(i), len(third_column))) for i in third_column)
print "First column:"
print "-------------"
for k,v in first_column_ocurrences.items():
print "{} = {}".format(k,v)
print "\nSecond column:"
print "-------------"
for k,v in second_column_ocurrences.items():
print "{} = {}".format(k,v)
print "\nThird column:"
print "-------------"
for k,v in third_column_ocurrences.items():
print "{} = {}".format(k,v)
答案 2 :(得分:0)
要求救援!
适用于任意偶数列。
awk '{for(i=1;i<=NF;i+=2)
if($i!=$(i+1))
a["column "i": "$i"/"$(i+1)]++}
END{for(k in a) print k,a[k]"/"NR}' file
column 1: A/T 2/5
column 3: C/G 4/5
column 5: C/G 3/5
答案 3 :(得分:0)
您根本不需要将行存储在内存中,您也可以使用csv lib进行解析:
from collections import Counter
import csv
with open('file.txt', 'r') as raw_data:
cn_a, cn_b, cn_c = Counter(),Counter(), Counter()
for a ,b, c in csv.reader(raw_data,delimiter=" "):
cn_a[a] += 1
cn_b[b] += 1
cn_c[c] += 1