我有两个清单:
single = ['A','B']
double = ['AA','BB']
存储在数据框df
中的数据:
0 1 2 3
0 All 1 AA Yes
1 A 2 All No
其中All表示第0列中的['A','B']
,表示第2列中的['AA','BB']
,我想获取以下数据帧df2
0 1 2 3
0 A 1 AA Yes
1 B 1 AA Yes
2 A 2 AA No
3 A 2 BB No
并且行索引的顺序并不重要。我现在在做:
single = ['A','B']
double = ['AA','BB']
df=pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])
index = []
for i in range(len(df)):
if df.loc[i,0] == 'All':
index.append(i)
for j in single:
df.loc[len(df),:] = df.loc[i,:]
df.loc[len(df)-1,0] = j
df = df.drop(index).reset_index(drop=True)
index = []
for i in range(len(df)):
if df.loc[i,2] == 'All':
index.append(i)
for j in double:
df.loc[len(df),:] = df.loc[i,:]
df.loc[len(df)-1,2] = j
df2 = df.drop(index).reset_index(drop=True)
print df2
它首先添加两行来表示' All'在第0列中删除此行。然后为所有'全部'在第2栏。
更容易实现此目的'查找并替换'?
答案 0 :(得分:2)
import pandas as pd
single = ['A','B']
double = ['AA','BB']
df = pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])
first = pd.DataFrame([x for item in single
for x in [('All', item), (item, item)]], columns=[0, 'first'])
third = pd.DataFrame([x for item in double
for x in [('All', item), (item, item)]], columns=[2, 'third'])
result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)
产量
0 1 2 3
0 A 1 AA Yes
1 B 1 AA Yes
2 A 2 AA No
3 A 2 BB No
主要思想是准备两个辅助DataFrame:
first = pd.DataFrame([x for item in single
for x in [('All', item), (item, item)]], columns=[0, 'first'])
# 0 first
# 0 All A
# 1 A A
# 2 All B
# 3 B B
third = pd.DataFrame([x for item in double
for x in [('All', item), (item, item)]], columns=[2, 'third'])
# 2 third
# 0 All AA
# 1 AA AA
# 2 All BB
# 3 BB BB
然后,所需的DataFrame是将df
与first
和third
合并的结果:
result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
# 0 1 2 3 first third
# 0 All 1 AA Yes A AA
# 1 All 1 AA Yes B AA
# 2 A 2 All No A AA
# 3 A 2 All No A BB
最后,删除0
和2
列,并将其替换为first
和third
列:
result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)
# 0 1 2 3
# 0 A 1 AA Yes
# 1 B 1 AA Yes
# 2 A 2 AA No
# 3 A 2 BB No