import pandas as pd
li = [{"employee_id":1,"project_handled": "pas"},{"employee_id":1,"project_handled": "asap"},{"employee_id":2,"project_handled": "trimm"},{"employee_id":2,"project_handled": "fat"}]
df = pd.DataFrame(li)
df.set_index("employee_id",inplace=True)
print(df)
给出:
project_handled
employee_id
1 pas
1 asap
2 trimm
2 fat
我想要的是,打印时不应重复索引值:
project_handled
employee_id
1 pas
asap
2 trimm
fat
我想将其序列化并使用DataFrame.to_excel api以excel的形式共享。而要求是索引不应该在employee_id列中重复。
答案 0 :(得分:1)
您需要设置MultiIndex
:
import pandas as pd
li = [{"employee_id":1,"project_handled": "pas"},{"employee_id":1,"project_handled": "asap"},{"employee_id":2,"project_handled": "trimm"},{"employee_id":2,"project_handled": "fat"}]
df = pd.DataFrame(li)
df['Something'] = 1
df.set_index(["employee_id", "project_handled"],inplace=True)
print(df)
我已添加Something
,因为否则你会得到:
Empty DataFrame
Columns: []
Index: [(1, pas), (1, asap), (2, trimm), (2, fat)]
修改强>
要在没有project_handled
的情况下创建它,您需要空列和MultiIndex
:
df["another"] = ""
df.set_index(["employee_id", "another"],inplace=True)
答案 1 :(得分:0)
如果您的唯一目标是以所需的方式将DataFrame
打印到csv,并且每个employee_id
值不需要只有一个单元格,那么您可以执行以下操作:
import pandas as pd
li = [{"employee_id":1,"project_handled": "pas"},{"employee_id":1,"project_handled": "asap"},{"employee_id":2,"project_handled": "trimm"},{"employee_id":2,"project_handled": "fat"}]
df = pd.DataFrame(li)
def custom_func(x):
for i in range(1, x['employee_id'].size):
x['employee_id'].iloc[i] = ''
return x;
df['employee_id'] = df['employee_id'].apply(str)
df = df.groupby('employee_id').apply(custom_func).set_index('employee_id')
print(df)
输出:
project_handled
employee_id
1 pas
asap
2 trimm
fat
df.to_csv('test.csv')
的结果如下: