我对python不太熟悉,但对基础知识有一定的了解。我相信我需要字典,但我目前正在做的事情不起作用,而且可能在时间上非常无效。
我正在尝试创建一个交叉矩阵,用于链接用户之间的评论:评论者列表,他们的个人评论,与评论相关的元数据。
注意:这是用Python 2.7.10编写的 - 我不能使用Python 3因为过时的系统会运行,yada yada。
print '\nCompiling Review Maps... ';
LbidMap = {};
TbidMap = {};
for user in reviewer_idx :
for review in data['Reviewer Reviews'][user] :
reviewInfo = data['Review Information'][review];
stars = float(reviewInfo['stars']);
bid = reviewInfo['business_id'];
# Initialize lists where necessary
# !!!! I know this is probably not effective, but am unsure of
# a better method. Open to suggestions !!!!!
if bid not in LbidMap:
LbidMap[bid] = {};
TbidMap[bid] = {};
if stars not in LbidMap[bid] :
LbidMap[bid][stars] = {};
if user not in TbidMap[bid] :
TbidMap[bid][user] = {};
# Track information on ratings to each business
LbidMap[bid][stars][user] = review;
TbidMap[bid][user][review] = stars;
(其中'bid'是“Business ID”的缩写,pos_list是用户在运行时给出的输入)
然后我继续尝试创建一个用户的映射,这些用户对业务T进行了“积极”审核,同时也给予业务L评级为X(例如,5人评级为业务L 4/5星,有多少那些人还给商业T做了“积极的”评论?)
# Determine and map all users who rated business L as rL
# and gave business T a positive rating
print '\nCross matching ratings across businesses';
cross_TrL = [];
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
for stars in LbidMap[Lbid] :
starSum = len(LbidMap[Lbid][stars]);
posTbid = 0;
for user in LbidMap[Lbid][stars] :
if user in TbidMap[Tbid] :
rid = LbidMap[Lbid][stars][user];
print 'Tbid:%s Lbid:%s user:%s rid:%s'%(Tbid, Lbid, user, rid);
reviewRate = TbidMap[Tbid][user][rid];
# If true, then we have pos review for T from L
if reviewRate in pos_list :
posTbid += 1;
numerator = posTbid + 1;
denominator = starSum + 1;
probability = float(numerator) / denominator;
我目前收到以下错误(打印出当前变量也提供):
Tbid:OlpyplEJ_c_hFxyand_Wxw Lbid:W0eocyGliMbg8NScqERaiA user:Neal_1EVupQKZKv3NsC2DA rid:TAIDnnpBMR16BwZsap9uwA
Traceback (most recent call last):
File "run_edge_testAdvProb.py", line 90, in <module>
reviewRate = TbidMap[Tbid][user][rid];
KeyError: u'TAIDnnpBMR16BwZsap9uwA'
所以,我知道KeyError是关于什么应该在TbidMap中的特定时刻摆脱(审核ID),但在我看来,Key在某种程度上不包含在第一个代码中初始化块。
我做错了什么?此外,欢迎提出如何改善第二个代码块上的时钟周期的建议。
编辑:我意识到我尝试使用来自Lbid的rid
来找到Tbid的rid
,但rid
对于每次审核都是唯一的,所以你没有Tbid.rid
== Lbid.rid
。
更新了第二个代码块,如下:
cross_TrL = [];
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
# Get numer of reviews at EACH STAR rate for L
for stars in LbidMap[Lbid] :
starSum = len(LbidMap[Lbid][stars]);
posTbid = 0;
# For each review check if user rated the Tbid
for Lreview in LbidMap[Lbid][stars] :
user = LbidMap[Lbid][stars][Lreview];
if user in TbidMap[Tbid] :
# user rev'd Tbid, get their Trid
# and see if they gave Tbid a pos rev
for Trid in TbidMap[Tbid][user] :
# Currently this does not account for multiple reviews
# given by the same person. Just want to get this
# working and then I'll minimize this
Tstar = TbidMap[Tbid][user][Trid];
print 'Tbid:%s Lbid:%s user:%s Trid:%s'%(Tbid, Lbid, user, Trid);
if Tstar in pos_list :
posTbid += 1;
numerator = posTbid + 1;
denominator = starSum + 1;
probability = float(numerator) / denominator;
evaluation = {'Tbid':Tbid, 'Lbid':Lbid, 'star':stars, 'prob':probability}
cross_TrL.append(evaluation);
仍然很慢,但我不再收到错误。