我是熊猫的初学者。我有两列,我把它们合并在一起。
我试着按每个州和每个身份证号码计算每一行。我有成千上万的id和状态所以,有人可以帮助我解决我的问题吗?谢谢。
draft= df[["ID", "STATE" ]]
draft
Out[5]:
ID STATE
0 11 chr1:100154376:G:A
1 2 chr1:100177723:C:T
2 9 chr1:100177723:C:T
3 1 chr1:100194200:-:AA
4 8 chr1:10032249:A:G
5 2 chr1:100340787:G:A
6 1 chr1:100349757:A:G
7 3 chr1:10041186:C:A
8 10 chr1:100476986:G:C
9 4 chr1:100572459:C:T
10 5 chr1:100572459:C:T
chars = "TGC-"
number = {}
for item in chars:
d = draft
At = d.str.contains("A:" + item)
num = At.value_counts(sort=True)
number[item] = num
id_num1=sd["ID"].value_counts()
id_values1= id_num1.order()
答案 0 :(得分:1)
这是我对stackoverflow的第一个回答。如果它没有意义,请忽略它。我不是一个经验丰富的编码员 - 但我喜欢熊猫。我想你想做这样的事情。
import pandas as pd
import numpy as np
ids = [21,2,9,1,8,2,1,3,10,4,4]
states = ['GA','CT','AA','AG','CA','GC','CT','CT','CA','AG','AG']
draft = pd.DataFrame({'ids':ids,'state':states})
draft
d = dict()
for dex, row in draft.iterrows():
x = row['ids']
y = row['state']
if y in d:
# append the new state to the existing array at this slot
d[y].append(x)
else:
# create a new array in this slot
d[y] = [x]
包含州和计数的新词典:
d
{'AA': [9],
'AG': [1, 4, 4],
'CA': [8, 10],
'CT': [2, 1, 3],
'GA': [21],
'GC': [2]}
显示结果:
for key, value in d.iteritems():
print key, len(value)
AA 1
AG 3
CA 2
GC 1
GA 1
CT 3