我正在尝试重塑数据框,以使其成为更有用的图形结构,现在我一直在做的事情是在What is the most efficient way to loop through dataframes with pandas?
之后使用iterrows或itertuples重塑df。下面是一个过于简化的数据集,但实际数据集将有成千上万的行。
group subtopic code
fruit grapes 110A
fruit apple 110B
meat pork 220A
meat chicken 220B
meat duck 220C
vegetable lettuce 300A
vegetable tomato 310A
vegetable asparagus 320A
基本上,我想基于列(“代码”)是否在列“组”中共享相同的值来创建新列(“ code2”)。
我尝试运行以下代码:
df = pd.read_excel(file1, sheetname = 'Sheet3')
def reshape_iterrows(df):
reshape = []
for i, j, in df.iterrows():
for _, k in df.iterrows():
if (j['code'] == k['code']):
pass
elif j['group'] == 'nan':
reshape.append({'code1':j['code'],
'code2': j['code'],
'group': 'None'})
elif (j['group'] == k['group']):
reshape.append({'code1': j['code'],
'code2': k['code'],
'group': j['group']})
else:
pass
return reshape
reshape_iterrows(df)
或使用itertuples:
def reshape_iterrows(df):
reshape = []
for row1 df.itertuples():
for row2 in df.itertuples():
if (row1[3] == row2[3]):
pass
elif row1[1] == 'nan':
reshape.append({'code1':row1[3],
'code2': row1[3],
'group': 'None'})
elif (row1[1] == row2[1]):
reshape.append({'code1': row1[3],
'code2': row2[3],
'group': row1[1]})
else:
pass
return reshape
我将重整形传递给pd.DataFrame(),下面是预期的输出,然后使用code1和code2列作为nx.from_pandas_edgelist中的源和目标参数来生成图形。
code1 code2 group
0 110A 110B fruit
1 110B 110A fruit
2 220A 220B meat
3 220A 220C meat
4 220B 220A meat
5 220B 220C meat
6 220C 220A meat
7 220C 220B meat
8 300A 300B vegetable
9 300A 300C vegetable
10 300B 300A vegetable
11 300B 300C vegetable
12 300C 300A vegetable
13 300C 300B vegetable
像其他人一样,我有兴趣寻找一种更有效的方法来迭代使用Numpy的布尔运算?寻找有关如何使用向量化/数组操作获得相同结果的指南。
谢谢!
答案 0 :(得分:2)
您可以尝试:
from itertools import permutations
df.groupby('group')['code']\
.apply(lambda x: pd.DataFrame(list(permutations(x.tolist(),2))))\
.add_prefix('code').reset_index().drop('level_1',axis=1)
输出:
group code0 code1
0 fruit 110A 110B
1 fruit 110B 110A
2 meat 220A 220B
3 meat 220A 220C
4 meat 220B 220A
5 meat 220B 220C
6 meat 220C 220A
7 meat 220C 220B
8 vegetable 300A 310A
9 vegetable 300A 320A
10 vegetable 310A 300A
11 vegetable 310A 320A
12 vegetable 320A 300A
13 vegetable 320A 310A
答案 1 :(得分:2)
它可能不是最有效的,但这是我尝试过的。我付出了太多的努力,只是浪费了我的答案:)
我的回答的好处是所有步骤都明确。而且,如果您需要在两者之间做一些事情(或者意识到您只想要名称,而不是代码,则可以只注释一行)。
import pandas as pd
from itertools import permutations
def get_data():
return {
'group' : [
'fruit', 'fruit',
'meat', 'meat', 'meat',
'vegetable', 'vegetable', 'vegetable'
],
'subtopic' : [
'grapes', 'apple',
'pork', 'chicken', 'duck',
'lettuce', 'tomato', 'asparagus'
],
'code' : [
'110A', '110B',
'220A', '220B', '220C',
'300A', '310A', '320A'
]
}
# Used to retrieve code for specific item
def make_code_map(df):
return dict(df[['subtopic', 'code']].to_dict('split')['data'])
# Used to retrieve group for specific item.
def make_group_map(df):
return dict(df[['subtopic', 'group']].to_dict('split')['data'])
if __name__ == '__main__':
df = pd.DataFrame(get_data())
mapping = make_code_map(df)
group_map = make_group_map(df)
graph_edges = []
for name, group in df.groupby('group'):
graph_edges.extend( permutations(group['subtopic'].tolist(), 2) )
ndf = pd.DataFrame(graph_edges, columns=['code1', 'code2'])
# Applying the group map to get all the correct groups for each
# item.
ndf['group'] = ndf['code1'].apply(lambda x:group_map[x])
# Replace each item with its corresponding code.
ndf = ndf.replace(mapping)
print(ndf)
# code1 code2 group
# 0 110A 110B fruit
# 1 110B 110A fruit
# 2 220A 220B meat
# 3 220A 220C meat
# 4 220B 220A meat
# 5 220B 220C meat
# 6 220C 220A meat
# 7 220C 220B meat
# 8 300A 310A vegetable
# 9 300A 320A vegetable
# 10 310A 300A vegetable
# 11 310A 320A vegetable
# 12 320A 300A vegetable
# 13 320A 310A vegetable