我有一个非常大的 Pandas 数据框,看起来像这样我用于关系表(有几千行,每个 trail_id 的活动数量各不相同):
我希望它像这样形成:
我已经尝试了这两种方法,但似乎不起作用:
pd.melt(df)
df.stack().reset_index()
任何帮助将不胜感激!
答案 0 :(得分:3)
您需要将 id_vars
传递给 .melt()
才能获得您想要的输出。
>>> df.melt(id_vars='trail_id', value_name='activity_id').drop(columns='variable')
trail_id activity_id
0 1 1
1 2 1
2 3 1
3 4 3
4 5 2
5 1 2
6 2 2
7 3 2
8 4 4
9 5 5
10 1 3
11 2 4
12 3 6
13 4 7
14 5 9
答案 1 :(得分:2)
您可以执行以下代码:
import pandas as pd
# Initializing
dataframe1 = pd.DataFrame({'trail_id':[1,2,3,4,5],
'activity_1':[1,1,1,3,2],
'activity_2':[2,2,2,4,5],
'activity_3':[3,4,6,7,9]})
dictionary = dataframe1.to_dict()
# Create the final dictionary to put the values in
main_dict = {"trail_id":[], "activity_id":[]}
for key,value in dictionary.items():
if(key == "trail_id"):
continue
else:
main_dict["trail_id"] += list(dictionary["trail_id"].values())
main_dict["activity_id"] += list(value.values())
# Dropping the index is not necessary but it helps to have a cleaner output
last_dataframe = pd.DataFrame(data=main_dict).sort_values(by = ["trail_id"]).reset_index(drop=True)
print(last_dataframe)
输出
trail_id activity_id
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 4
6 3 1
7 3 2
8 3 6
9 4 3
10 4 4
11 4 7
12 5 2
13 5 5
14 5 9
答案 2 :(得分:1)
df = (
pd.concat(
[
df["trail_id"],
df.loc[:, "activity_1":"activity_3"].apply(list, axis=1),
],
axis=1,
)
.explode(0)
.rename(columns={0: "activity_id"})
)
print(df)
打印:
trail_id activity_id
0 1 1
0 1 2
0 1 3
1 2 1
1 2 2
1 2 4
2 3 1
2 3 2
2 3 6
3 4 3
3 4 4
3 4 7
4 5 2
4 5 5
4 5 9