根据地区添加句子分数Python非常丢失

时间:2016-11-16 12:45:54

标签: python python-3.x

我有一个包含200多条推文的txt文件和即时通讯尝试计算特定区域内所有推文的总分 给出了他们的long / lat ..一个典型的推文看起来像:

[30.346168930000001, -97.73518] 0 2011-08-29 04:54:22 Best vacation of my life #byfar
我以前做过这个,但只计算句子的行,所以我做了旁注,我有一个包含单词的文件,另一个包含句子,我必须看看是否有任何单词句子并加上单词的数量和他们的情绪值,这是与单词相关的值..看起来像

Happy: 1
Sad, 5:

with open('words.txt') as f:
    sentiments = {word: int(value)
                 for word, value in
                 (line.split(",") for line in f)}

with open('sentences.txt') as f:
    for line in f:
        values = Counter(word for word in line.split() if word in sentiments)
        if not values:
            continue

但整个长期的拉特业务,我不知道如何在特定地区添加所有分数。主要是因为我对长期和宽容感到困惑。

所以,首先我试着&#34;近似&#34;对应于其时区的区域(不是真实数据)。所以东部(P1。P2.P3,P4 太平洋(P7,P8,P9,P10)山(P5,P6,P7,P8)< / strong>,中央(P3,P4,P5,P6) ..

所以有了这个信息:

p1 = (49.189787, -67.444574) 
p2 = (24.660845, -67.444574)
p3 = (49.189787, -87.518395) 
p4 = (24.660845, -87.518395) 
p5 = (49.189787, -101.998892)
p6 = (24.660845, -101.998892) 
p7 = (49.189787, -115.236428) 
p8 = (24.660845, -115.236428) 
p9 = (49.189787, -125.242264)
p10 = (24.660845, -125.242264)

我确定区域为

class Region:
    def __init__(self, lat_tuple, long_tuple):
        self.lat_tuple = lat_tuple
        self.long_tuple = long_tuple

    def contains(self, lat, long):
        return self.lat_tuple[0] <= lat and lat < self.lat_tuple[1] and\
               self.long_tuple[0] <= long and long < self.long_tuple[1]

 eastern = Region((24.660845, 49.189787), (-87.518395, -67.444574))
 central = Region((24.660845, 49.189787), (-101.998892, -87.518395))
 mountain = Region((24.660845, 49.189787), (-115.236428, -101.998892))
 pacific = Region((24.660845, 49.189787), (-125.242264, -115.236428))

我想我已经完成了大部分工作,但我只是不知道怎么说如果这些推文都在。我需要帮助将特定区域的所有句子加起来。或者只是一个大纲

1 个答案:

答案 0 :(得分:0)

我没有完全检查你的坐标,但你似乎走在了正确的轨道上。使用你所做的,我需要做的就是解析推文文件:

scores = {'eastern':0,'central':0,'pacific':0,'mountain':0}
for line in open('tweets.txt'):
    line = line.split(" ")
    lat  = float(line[0][1:-1]) #Stripping the [ and the ,
    long = float(line[1][:-1])  #Stripping the ]
    if eastern.contains(lat,long):
         scores['eastern'] += score(line) #Assuming you have a score function
    elif central.contains(lat,long):
         scores['central'] += score(line)         
    elif mountain.contains(lat,long):
         scores['mountain'] += score(line)
    elif pacific.contains(lat,long):
         scores['pacific'] += score(line)
    else: raise ValueError("Could not locate coordinates "+line[0]+line[1])

通过将if语句包装在函数中,可以使这更加优雅:

def region(lat,long):
    #DEFINE HERE YOUR REGIONS, IN THE Function, or leave them as globals
    if eastern.contains(lat,long):  return 'eastern'
    if central.contains(lat,long):  return 'central'         
    if mountain.contains(lat,long): return 'mountain'
    if pacific.contains(lat,long):  return 'pacific'
    raise ValueError(" ".join(("could not locate coordinates",str(lat),str(long))))

比循环中的if语句消失了:

scores[region(lat,long)] += score(line)

编辑: 你需要将得分定义为接受推文的函数,或上面代码中的分割线(这是一个单词列表,包括坐标):

def score(tweet):
    total = 0
    for word in tweet:
        if word in sentiments: total += 1
    return total/(len(tweet)-2) #Subtract the coordinates from the length)

假设事先定义了全局sentiments