Pandas Dataframe查找并替换

时间:2016-07-06 21:23:10

标签: python pandas

我有两个清单:

single = ['A','B']
double = ['AA','BB']

存储在数据框df中的数据:

     0  1    2    3
0  All  1   AA  Yes
1    A  2  All   No

其中All表示第0列中的['A','B'],表示第2列中的['AA','BB'],我想获取以下数据帧df2

    0  1   2    3
0   A  1  AA  Yes
1   B  1  AA  Yes
2   A  2  AA   No
3   A  2  BB   No

并且行索引的顺序并不重要。我现在在做:

single = ['A','B']
double = ['AA','BB']
df=pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

index = []
for i in range(len(df)):
    if df.loc[i,0] == 'All':
        index.append(i)
        for j in single:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,0] = j
df = df.drop(index).reset_index(drop=True)

index = []
for i in range(len(df)):
    if df.loc[i,2] == 'All':
        index.append(i)
        for j in double:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,2] = j
df2 = df.drop(index).reset_index(drop=True)
print df2

它首先添加两行来表示' All'在第0列中删除此行。然后为所有'全部'在第2栏。

更容易实现此目的'查找并替换'?

1 个答案:

答案 0 :(得分:2)

import pandas as pd

single = ['A','B']
double = ['AA','BB']
df = pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)

产量

   0  1   2    3
0  A  1  AA  Yes
1  B  1  AA  Yes
2  A  2  AA   No
3  A  2  BB   No

主要思想是准备两个辅助DataFrame:

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
#      0 first
# 0  All     A
# 1    A     A
# 2  All     B
# 3    B     B

third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 
#      2 third
# 0  All    AA
# 1   AA    AA
# 2  All    BB
# 3   BB    BB

然后,所需的DataFrame是将dffirstthird合并的结果:

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
#      0  1    2    3 first third
# 0  All  1   AA  Yes     A    AA
# 1  All  1   AA  Yes     B    AA
# 2    A  2  All   No     A    AA
# 3    A  2  All   No     A    BB

最后,删除02列,并将其替换为firstthird列:

result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)
#    0  1   2    3
# 0  A  1  AA  Yes
# 1  B  1  AA  Yes
# 2  A  2  AA   No
# 3  A  2  BB   No