我是 Python 新手,需要一些帮助。
我有 2 个数据框,其中包含一个用户列表以及来自两个表的推荐朋友列表。
我想实现以下目标:
我已经尝试了我的代码,但没有达到预期的结果。
import pandas as pd
import numpy as np
///load data from csv
df1 = pd.read_csv('CommonFriend.csv')
df2 = pd.read_csv('InfluenceFriend.csv')
print(df1)
print(df2)
///convert values to list to sort by recommended friends ID
df1.values.tolist()
df1.sort_values(by=['User','RecommendedFriends'])
df2.values.tolist()
df2.sort_values(by=['User','RecommendedFriends'])
///obtain only matched values from list of recommended friends from df1 and df2.
df3 = df1.merge(df2, how='inner', on='User')
/// return dataframe with user, matched recommendedfriends ID
print(df3)
遇到的问题:
答案 0 :(得分:0)
我不太明白您的问题究竟是什么,但通常您必须再次将数据帧分配给自身。
import pandas as pd
import numpy as np
df1 = pd.read_csv('CommonFriend.csv')
df2 = pd.read_csv('InfluenceFriend.csv')
print(df1)
print(df2)
df1 = df1.values.tolist()
df1 = df1.sort_values(by=['User','RecommendedFriends'])
df2 = df2.values.tolist()
df2 = df2.sort_values(by=['User','RecommendedFriends'])
df3 = df1.merge(df2, how='inner', on='User')
print(df3)
答案 1 :(得分:0)
这应该是您问题的解决方案。您可能需要更改一些变量,但您明白了:您将用户的两个数据框合并,这样您就可以得到一个包含每个用户的两个列表的数据框。然后取两个列表的交集并将其存储在新列中。
df1 = pd.DataFrame(np.array([[1, [5, 7, 10, 11]], [2, [3, 8, 5, 12]]]),
columns=['User', 'Recommended friends'])
df2 = pd.DataFrame(np.array([[1, [5, 7, 9]], [2, [4, 7, 10]], [3, [15, 7, 9]]]),
columns=['User', 'Recommended friends'])
df3 = pd.merge(df1, df2, on='User')
df3['intersection'] = [list(set(a).intersection(set(b))) for a, b in zip(df3['Recommended friends_x'], df3['Recommended friends_y'])]
输出df3
:
User Recommended friends_x Recommended friends_y intersection
0 1 [5, 7, 10, 11] [5, 7, 9] [5, 7]
1 2 [3, 8, 5, 12] [4, 7, 10] []