我有一个包含7列的大文件,我想比较2列,col 1和col 7
chr_locations(col 1) gene_name(col 7)
chr1:66997989-67000678 geneA
chr1:66997824-67000456 geneA
chr2:33544389-33548489 geneB
chr2:33546285-33547055 geneB
chr2:44567890-44568980 geneB
我想计算给定基因的染色体位置的出现次数:
chr1:66997989-67000678 geneA 2
chr1:66997824-67000456 geneA 2
chr2:33544389-33548489 geneB 3
chr2:33546285-33547055 geneB 3
chr2:44567890-44568980 geneB 3
我确信在awk中有一种比在python中编写脚本更简单的方法,你们中的任何人都可以帮忙吗?谢谢。
答案 0 :(得分:2)
您需要一个数组来保持计数,并使用由2列构建的数组键
ShapeRenderer shapeRenderer = new ShapeRenderer();
shapeRenderer.begin(ShapeRenderer.ShapeType.Line);
shapeRenderer.setColor(0, 0, 0, 1);
float unitHeight = Gdx.graphics.getHeight() / 9;
float indent = Gdx.graphics.getWidth() / 20;
shapeRenderer.rect(indent, unitHeight, Gdx.graphics.getWidth() - indent * 2, unitHeight);
shapeRenderer.rect(indent, unitHeight * 3, Gdx.graphics.getWidth() - indent * 2, unitHeight);
shapeRenderer.rect(indent, unitHeight * 5, Gdx.graphics.getWidth() - indent * 2, unitHeight);
shapeRenderer.rect(indent, unitHeight * 7, Gdx.graphics.getWidth() - indent * 2, unitHeight);
shapeRenderer.end();
如果您希望我们测试我们的答案,您需要提供一些实际数据。
答案 1 :(得分:2)
使用这两种语言很容易(真的是任何语言)....一切都取决于你的知识
<强> AWK 强>
awk '{
count[$7]++;
memory_1[NR] = $1;
memory_7[NR] = $7;
}
END{
for(i=1; i<=NR; ++i) print memory_1[i] OFS memory_7[i] OFS count[memory_7[i]]
}' file
<强>蟒强>
records = [line.split() for line in open("file").readlines()]
from collections import Counter
count = Counter(r[6] for r in records)
print "\n".join("\t".join((r[0], r[6], str(count[r[6]]))) for r in records)
你得到:
chr1:66997989-67000678 geneA 2 chr1:66997824-67000456 geneA 2 chr2:33544389-33548489 geneB 3 chr2:33546285-33547055 geneB 3 chr2:44567890-44568980 geneB 3