在pandas

时间:2015-08-10 15:38:45

标签: python string pandas count integer

我是熊猫的初学者。我有两列,我把它们合并在一起。

我试着按每个州和每个身份证号码计算每一行。我有成千上万的id和状态所以,有人可以帮助我解决我的问题吗?谢谢。

draft= df[["ID", "STATE" ]]

draft
Out[5]: 
           ID                                         STATE
0          11                                 chr1:100154376:G:A
1           2                                 chr1:100177723:C:T
2           9                                 chr1:100177723:C:T
3           1                                chr1:100194200:-:AA
4           8                                  chr1:10032249:A:G
5           2                                 chr1:100340787:G:A
6           1                                 chr1:100349757:A:G
7           3                                  chr1:10041186:C:A
8          10                                 chr1:100476986:G:C
9           4                                 chr1:100572459:C:T
10          5                                 chr1:100572459:C:T


chars = "TGC-"
number = {}

for item in chars:
    d = draft
    At = d.str.contains("A:" + item)
    num = At.value_counts(sort=True)
    number[item] = num
    id_num1=sd["ID"].value_counts()
    id_values1= id_num1.order()

1 个答案:

答案 0 :(得分:1)

这是我对stackoverflow的第一个回答。如果它没有意义,请忽略它。我不是一个经验丰富的编码员 - 但我喜欢熊猫。我想你想做这样的事情。

import pandas as pd
import numpy as np
ids = [21,2,9,1,8,2,1,3,10,4,4]
states = ['GA','CT','AA','AG','CA','GC','CT','CT','CA','AG','AG']
draft = pd.DataFrame({'ids':ids,'state':states})
draft

d = dict()
for dex, row in draft.iterrows():
    x = row['ids']
    y = row['state']

    if y in d:
        # append the new state to the existing array at this slot
        d[y].append(x)
    else:
        # create a new array in this slot
        d[y] = [x]

包含州和计数的新词典:

d
{'AA': [9],
 'AG': [1, 4, 4],
 'CA': [8, 10],
 'CT': [2, 1, 3],
 'GA': [21],
 'GC': [2]}

显示结果:

for key, value in d.iteritems():
     print key, len(value)

AA 1
AG 3
CA 2
GC 1
GA 1
CT 3