我有两个单独的文件,格式如下:
File 1:
Polygon_Id::User_Id::Movie_ID::Rating
02143 69 1183 5
02143 75 3006 4
02143 89 1196 3
02143 32 590 2
02143 11 593 5
file 2:
User_Id::Gender::Age::Occupation::ZipCode
1 F 1 10 48067
2 M 56 16 70072
3 M 25 15 55117....
我的工作是:我必须在文件1中找到与电影ID匹配的User_Ids,然后从文件2中找到这些用户的年龄
到目前为止我的代码是:
with open('c:/Python27/file1.txt','r+') as f:
data=f.read()
data=[line.split() for line in data.splitlines()]
import numpy as np
from itertools import groupby, combinations
key1=lambda x:x[2]
for p,q in groupby(sorted(data,key=key1),key1):
rating_list=list(map(lambda x:x[3],q))
rating_list=np.array(rating_list)
with open('c:/Python27/file2.txt','r+') as f1:
user_data=f1.read()
user_data=[line.split() for line in user_data.splitlines()]
user_data=np.array(user_data)
for i,j in groupby(sorted(user_data,key=lambda x:x[0]),lambda x:x[0]):
if p==i:
user_age=list(map(lambda x:x[2],j))
user_age=[float(n) for n in user_age]
user_age=np.array(user_age)
print i,p,user_age
1009 1009 ['50']
1013 1013 ['56']...
理想情况下,输出中应该有更多的年龄值,而不是我得到的年龄值。