Python Multi-Index:使用2级索引DataFrame查找cordinates

时间:2018-01-04 11:21:40

标签: python pandas dataframe indexing multi-index

我有一个带有多索引索引和列的空数据框。我还有二级索引的坐标字符串列表。因为我的所有二级索引都是唯一的,所以我希望找到包含字符串列表的坐标和输入值。看一下下面的例子

df=
       DNA      Cat2                                 ....   
       Item     A   B   C   D   E   F   F   H   I   J   
DNA   Item
Cat2  A         0   0   0   0   0   0   0   0   0   0 
      B         0   0   0   0   0   0   0   0   0   0 
      C         0   0   0   0   0   0   0   0   0   0 
      D         0   0   0   0   0   0   0   0   0   0 
      E         0   0   0   0   0   0   0   0   0   0 
      F         0   0   0   0   0   0   0   0   0   0 
....

str_cord = [(A,B),(A,H),(A,I),(B,H),(B,I),(H,I)]
#and my output should be like below.

df_result=
       DNA      Cat2                                 ....   
       Item     A   B   C   D   E   F   F   H   I   J   
DNA   Item
Cat2  A         0   1   0   0   0   0   0   1   1   0 
      B         0   0   0   0   0   0   0   1   1   0 
      C         0   0   0   0   0   0   0   0   0   0 
      D         0   0   0   0   0   0   0   0   0   0 
      E         0   0   0   0   0   0   0   0   0   0 
      F         0   0   0   0   0   0   0   0   0   0 
      H         0   0   0   0   0   0   0   0   1   0
....

它看起来有点复杂,但我想要做的就是使用我的str_cord [0]作为df_result的坐标。我尝试使用.loc,但似乎我需要输入1级索引。我正在寻找我不必输入多索引level1并找到level2字符串的坐标的方式。希望它有意义,并提前感谢! (哦,数据本身非常大,尽可能高效)

1 个答案:

答案 0 :(得分:1)

您可以使用:

for i, j in str_cord:
    idx = pd.IndexSlice
    df.loc[idx[:, i], idx[:, j]] = 1

样品:

L = list('ABCDEFGHIJ')
mux = pd.MultiIndex.from_product([['Cat1','Cat2'], L])

df = pd.DataFrame(0, index=mux, columns=mux)
print (df)
       Cat1                            Cat2                           
          A  B  C  D  E  F  G  H  I  J    A  B  C  D  E  F  G  H  I  J
Cat1 A    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     B    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     C    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     D    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     E    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     F    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     G    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     H    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     I    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     J    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
Cat2 A    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     B    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     C    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     D    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     E    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     F    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     G    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     H    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     I    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     J    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
str_cord = [('A','B'),('A','H'),('A','I'),('B','H'),('B','I'),('H','I')]

for i, j in str_cord:
    idx = pd.IndexSlice
    df.loc[idx[:, i], idx[:, j]] = 1
print (df)
       Cat1                            Cat2                           
          A  B  C  D  E  F  G  H  I  J    A  B  C  D  E  F  G  H  I  J
Cat1 A    0  1  0  0  0  0  0  1  1  0    0  1  0  0  0  0  0  1  1  0
     B    0  0  0  0  0  0  0  1  1  0    0  0  0  0  0  0  0  1  1  0
     C    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     D    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     E    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     F    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     G    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     H    0  0  0  0  0  0  0  0  1  0    0  0  0  0  0  0  0  0  1  0
     I    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     J    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
Cat2 A    0  1  0  0  0  0  0  1  1  0    0  1  0  0  0  0  0  1  1  0
     B    0  0  0  0  0  0  0  1  1  0    0  0  0  0  0  0  0  1  1  0
     C    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     D    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     E    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     F    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     G    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     H    0  0  0  0  0  0  0  0  1  0    0  0  0  0  0  0  0  0  1  0
     I    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0
     J    0  0  0  0  0  0  0  0  0  0    0  0  0  0  0  0  0  0  0  0