我想在我的DataFrame中逐行计算STATE和ID列,但是我得到了一个KeyError。我的最终代码如下。对于每个ID号(1到12),我想计算状态变化。这是我的数据,我有数以千计的这些数据。
#this code works for state column
chars = "TGC-"
nums = {}
for char in chars:
s = df["STATE"]
A = s.str.contains("A:" + char)
num = A.value_counts(sort=True)
nums[char] = num
ATnum = nums["T"]
AGnum = nums["G"]
ACnum= nums["C"]
A_num= nums["-"]
ATnum
Out[26]:
False 51919
True 1248
dtype: int64
# and this one works for id column
pt = df.sort("ID")["ID"]
pt_num=pt.value_counts()
pt_values= pt_num.order()
pt_index= pt_num.sort_index()
#these are the total numbers of each id datas
pt_num
Out[27]:
10 5241
6 5144
11 4561
2 4439
3 4346
5 4284
9 4244
12 4218
7 4217
1 4210
4 4199
8 4064
dtype: int64
# i combine both ID and STATE columns and try to read row-by-row
draft
Out[21]:
ID STATE
0 11 chr1:100154376:G:A
1 2 chr1:100177723:C:T
2 9 chr1:100177723:C:T
3 1 chr1:100194200:-:AA
4 8 chr1:10032249:A:G
5 2 chr1:100340787:G:A
6 1 chr1:100349757:A:G
7 3 chr1:10041186:C:A
8 10 chr1:100476986:G:C
9 4 chr1:100572459:C:T
10 5 chr1:100572459:C:T
11 2 chr1:100671861:T:-
12 4 chr1:1021390:C:A
13 5 chr1:10228220:G:C
14 3 chr1:1026913:C:T
15 4 chr1:1026913:C:T
... ... ...
53152 6 chrY:21618583:G:C
53153 5 chrY:24443836:T:G
53154 6 chrY:24443836:T:G
53155 8 chrY:24443836:T:G
53156 10 chrY:24443836:T:G
53157 12 chrY:24443836:T:G
53158 3 chrY:5605924:C:T
53159 2 chrY:6932151:G:A
53160 10 chrY:7224175:G:T
53161 2 chrY:9197998:C:T
53162 3 chrY:9197998:C:T
53163 4 chrY:9197998:C:T
53164 11 chrY:9197998:C:T
53165 12 chrY:9197998:C:T
53166 11 chrY:9304866:G:A
[53167 rows x 2 columns]
draft= df[["ID", "STATE" ]]
chars = "TGC-"
number = {}
d = draft
for i in d["ID"]:
if i==1:
for item in chars:
At = d["STATE"].str.contains("A:" + item)
num = At.value_counts(sort=True)
number[item] = num
ATn=number["T"]
AGn=number["G"]
ACn=number["C"]
A_n=number["-"]
KeyError: 'G'
总的来说,我想要做的是,例如,ID 1有4210行,我想确定其中有多少行的状态为A:T
,A:G
,{{1} }和A:C
。
我哪里错了?