如何使用条件逐行计数

时间:2015-08-10 17:43:59

标签: python pandas count

我想在我的DataFrame中逐行计算STATE和ID列,但是我得到了一个KeyError。我的最终代码如下。对于每个ID号(1到12),我想计算状态变化。这是我的数据,我有数以千计的这些数据。

#this code works for state column
chars = "TGC-"
nums = {}

for char in chars:
    s = df["STATE"]
    A = s.str.contains("A:" + char)
    num = A.value_counts(sort=True)
    nums[char] = num
ATnum = nums["T"]
AGnum = nums["G"]
ACnum= nums["C"]
A_num= nums["-"]

ATnum
Out[26]: 
False    51919
True      1248
dtype: int64

# and this one works for id column
pt = df.sort("ID")["ID"]
pt_num=pt.value_counts()
pt_values= pt_num.order()
pt_index= pt_num.sort_index()
#these are the total numbers of each id datas
pt_num
Out[27]: 
10    5241
6     5144
11    4561
2     4439
3     4346
5     4284
9     4244
12    4218
7     4217
1     4210
4     4199
8     4064
dtype: int64

# i combine both ID and STATE columns and try to read row-by-row
draft
Out[21]: 
           ID                                          STATE
0          11                                 chr1:100154376:G:A
1           2                                 chr1:100177723:C:T
2           9                                 chr1:100177723:C:T
3           1                                chr1:100194200:-:AA
4           8                                  chr1:10032249:A:G
5           2                                 chr1:100340787:G:A
6           1                                 chr1:100349757:A:G
7           3                                  chr1:10041186:C:A
8          10                                 chr1:100476986:G:C
9           4                                 chr1:100572459:C:T
10          5                                 chr1:100572459:C:T
11          2                                 chr1:100671861:T:-
12          4                                   chr1:1021390:C:A
13          5                                  chr1:10228220:G:C
14          3                                   chr1:1026913:C:T
15          4                                   chr1:1026913:C:T
...       ...                                                ...
53152       6                                  chrY:21618583:G:C
53153       5                                  chrY:24443836:T:G
53154       6                                  chrY:24443836:T:G
53155       8                                  chrY:24443836:T:G
53156      10                                  chrY:24443836:T:G
53157      12                                  chrY:24443836:T:G
53158       3                                   chrY:5605924:C:T
53159       2                                   chrY:6932151:G:A
53160      10                                   chrY:7224175:G:T
53161       2                                   chrY:9197998:C:T
53162       3                                   chrY:9197998:C:T
53163       4                                   chrY:9197998:C:T
53164      11                                   chrY:9197998:C:T
53165      12                                   chrY:9197998:C:T
53166      11                                   chrY:9304866:G:A

[53167 rows x 2 columns]

draft= df[["ID", "STATE" ]]
chars = "TGC-"
number = {}
d = draft
for i in d["ID"]:
   if i==1:
        for item in chars:
            At = d["STATE"].str.contains("A:" + item)
            num = At.value_counts(sort=True)
            number[item] = num
            ATn=number["T"]
            AGn=number["G"]
            ACn=number["C"]
            A_n=number["-"]

KeyError: 'G'

总的来说,我想要做的是,例如,ID 1有4210行,我想确定其中有多少行的状态为A:TA:G,{{1} }和A:C
我哪里错了?

0 个答案:

没有答案