我想对以下数据进行数据处理。我要在下面为每个经理添加另一行,其中经理和工人是相同的。 我该怎么办?
不是:对于经理来说,一切都一样。这只是我的数据集的示例场景 谢谢。
data = [['Tom','Aurora',4500,'Shelly','Chicago',43553]
,['Tom','Aurora',4500,'Alex','NewYork',43654]
,['Tom','Aurora',4500,'Kelly','Cincinnati',44674]
,['Jason','Charlotte',4567,'Jimmy','Boston',44984]
,['Jason','Charlotte',4567,'Aaron','Austin',44583]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Manager','Managercity',
'manager_id','Worker','WorkerCity','Worker_id'])
# print dataframe.
print(df)
下面所需的数据集
Manager Managercity manager_id Worker WorkerCity Worker_id
Tom Aurora 4500 Shelly Chicago 43553
Tom Aurora 4500 Alex NewYork 43654
Tom Aurora 4500 Kelly Cincinnati 44674
Tom Aurora 4500 Tom Aurora 4500
Jason Charlotte 4567 Jimmy Boston 44984
Jason Charlotte 4567 Aaron Austin 44583
Jason Charlotte 4567 Jason Charlotte 4567
谢谢
答案 0 :(得分:1)
尝试:
def add(gr):
new_row = gr.iloc[0,:]
new_row['Worker'] = new_row['Manager']
new_row['Worker_id'] = new_row['manager_id']
return gr.append(new_row)
df = df.groupby('Manager').apply(add).reset_index(drop = True)
您的样本数据不包含ManagerCity
,但您也可以在添加函数上使用new_row['Worker_city'] = new_row['Manager_city']
进行设置。
答案 1 :(得分:1)
您可以像这样使用pd.concat
和drop duplicates
:
data = [['Tom','Aurora',4500,'Shelly','Chicago',43553]
,['Tom','Aurora',4500,'Alex','NewYork',43654]
,['Tom','Aurora',4500,'Kelly','Cincinnati',44674]
,['Jason','Charlotte',4567,'Jimmy','Boston',44984]
,['Jason','Charlotte',4567,'Aaron','Austin',44583]
]
# Create the pandas DataFrame
df_in = pd.DataFrame(data, columns = ['Manager','Managercity', 'manager_id','Worker','WorkerCity','Worker_id'])
df_managers = pd.DataFrame(np.tile(df_in[['Manager','Managercity','manager_id']].drop_duplicates(),2),columns=df_in.columns)
df_out = pd.concat([df_in, df_managers]).sort_values('Manager').reset_index(drop=True)
print(df_out)
输出:
Manager Managercity manager_id Worker WorkerCity Worker_id
0 Jason Charlotte 4567 Jimmy Boston 44984
1 Jason Charlotte 4567 Aaron Austin 44583
2 Jason Charlotte 4567 Jason Charlotte 4567
3 Tom Aurora 4500 Shelly Chicago 43553
4 Tom Aurora 4500 Alex NewYork 43654
5 Tom Aurora 4500 Kelly Cincinnati 44674
6 Tom Aurora 4500 Tom Aurora 4500