我正在迭代分组项目,并根据项目,用户组合我正在使用以下代码提取用户功能:
for counter in grouped.iterrows():
user_id = counter[1]['user_id']
item_id = counter[1]['item_id']
career_level = get_value(users, user_id, 'career_level')
industry_id = get_value(users, user_id, 'industry_id')
country = get_value(users, user_id, 'country')
career_level = 'CL_' + str(career_level)
industry_id = 'IND_' + str(industry_id)
print(item_id, user_id, country, career_level, industry_id)
在输出中我得到的是:
5 797978 JO_4092133 ch CL_0 IND_1
12 1524899 JO_524518 JO_2169794 JO_2905196 de CL_2 IND_0
12 2703661 JO_1210814 JO_2573697 de CL_3 IND_0
14 1054241 JO_2804344 JO_1072229 de CL_3 IND_14
14 1297953 JO_3482421 de CL_6 IND_0
14 1548532 JO_425546 de CL_2 IND_0
14 1609264 JO_4438218 JO_1151866 de CL_3 IND_9
现在我想要的输出是这样的:
5 797978 JO_4092133 ch CL_0 IND_1
12 1524899 JO_524518 JO_2169794 JO_2905196 de CL_2 IND_0, 2703661 JO_1210814 JO_2573697 de CL_3 IND_0
14 1054241 JO_2804344 JO_1072229 de CL_3 IND_14, 1297953 JO_3482421 de CL_6 IND_0, 1548532 JO_425546 de CL_2 IND_0, 1609264 JO_4438218 JO_1151866 de CL_3 IND_9
这意味着如果某个user1与item1进行了交互而另一个user2也与item1进行了交互,那么user1和user2的功能应该是单行。
有人可以建议我怎样才能达到这个目标?
我的第二个问题是: 如何将此数据写入文件?
我是python的初学者。我感谢您的帮助。 感谢
答案 0 :(得分:1)
import numpy as np
import pandas as pd
from collections import defaultdict
# creating a dataframe
idx = ['one','two','two','two','three','three']
df = pd.DataFrame(np.random.randint(1,10,24).reshape((6,4)), index = idx, columns = list('ABCD'))
df = df.reset_index()
# converting the data frame to a dictionary based on the format desired
data_dict = defaultdict(list)
for counter in df.iterrows():
data_dict[str(counter[1][0])].append(str(list(counter[1][1:]))[1:-1].replace(",",""))
# writing the dictionary to file
df2 = pd.DataFrame.from_dict(data_dict, orient = 'index')
df2.to_csv('temp.csv', header = False)
这是你在找什么?