我有一个包含200多条推文的txt文件和即时通讯尝试计算特定区域内所有推文的总分 给出了他们的long / lat ..一个典型的推文看起来像:
[30.346168930000001, -97.73518] 0 2011-08-29 04:54:22 Best vacation of my life #byfar
我以前做过这个,但只计算句子的行,所以我做了旁注,我有一个包含单词的文件,另一个包含句子,我必须看看是否有任何单词句子并加上单词的数量和他们的情绪值,这是与单词相关的值..看起来像
Happy: 1
Sad, 5:
with open('words.txt') as f:
sentiments = {word: int(value)
for word, value in
(line.split(",") for line in f)}
with open('sentences.txt') as f:
for line in f:
values = Counter(word for word in line.split() if word in sentiments)
if not values:
continue
但整个长期的拉特业务,我不知道如何在特定地区添加所有分数。主要是因为我对长期和宽容感到困惑。
所以,首先我试着&#34;近似&#34;对应于其时区的区域(不是真实数据)。所以东部(P1。P2.P3,P4 ,太平洋(P7,P8,P9,P10),山(P5,P6,P7,P8)< / strong>,中央(P3,P4,P5,P6) ..
所以有了这个信息:
p1 = (49.189787, -67.444574)
p2 = (24.660845, -67.444574)
p3 = (49.189787, -87.518395)
p4 = (24.660845, -87.518395)
p5 = (49.189787, -101.998892)
p6 = (24.660845, -101.998892)
p7 = (49.189787, -115.236428)
p8 = (24.660845, -115.236428)
p9 = (49.189787, -125.242264)
p10 = (24.660845, -125.242264)
我确定区域为
class Region:
def __init__(self, lat_tuple, long_tuple):
self.lat_tuple = lat_tuple
self.long_tuple = long_tuple
def contains(self, lat, long):
return self.lat_tuple[0] <= lat and lat < self.lat_tuple[1] and\
self.long_tuple[0] <= long and long < self.long_tuple[1]
eastern = Region((24.660845, 49.189787), (-87.518395, -67.444574))
central = Region((24.660845, 49.189787), (-101.998892, -87.518395))
mountain = Region((24.660845, 49.189787), (-115.236428, -101.998892))
pacific = Region((24.660845, 49.189787), (-125.242264, -115.236428))
我想我已经完成了大部分工作,但我只是不知道怎么说如果这些推文都在。我需要帮助将特定区域的所有句子加起来。或者只是一个大纲
答案 0 :(得分:0)
我没有完全检查你的坐标,但你似乎走在了正确的轨道上。使用你所做的,我需要做的就是解析推文文件:
scores = {'eastern':0,'central':0,'pacific':0,'mountain':0}
for line in open('tweets.txt'):
line = line.split(" ")
lat = float(line[0][1:-1]) #Stripping the [ and the ,
long = float(line[1][:-1]) #Stripping the ]
if eastern.contains(lat,long):
scores['eastern'] += score(line) #Assuming you have a score function
elif central.contains(lat,long):
scores['central'] += score(line)
elif mountain.contains(lat,long):
scores['mountain'] += score(line)
elif pacific.contains(lat,long):
scores['pacific'] += score(line)
else: raise ValueError("Could not locate coordinates "+line[0]+line[1])
通过将if
语句包装在函数中,可以使这更加优雅:
def region(lat,long):
#DEFINE HERE YOUR REGIONS, IN THE Function, or leave them as globals
if eastern.contains(lat,long): return 'eastern'
if central.contains(lat,long): return 'central'
if mountain.contains(lat,long): return 'mountain'
if pacific.contains(lat,long): return 'pacific'
raise ValueError(" ".join(("could not locate coordinates",str(lat),str(long))))
比循环中的if语句消失了:
scores[region(lat,long)] += score(line)
编辑: 你需要将得分定义为接受推文的函数,或上面代码中的分割线(这是一个单词列表,包括坐标):
def score(tweet):
total = 0
for word in tweet:
if word in sentiments: total += 1
return total/(len(tweet)-2) #Subtract the coordinates from the length)
假设事先定义了全局sentiments
。