Question

我有一个包含200多条推文的txt文件和即时通讯尝试计算特定区域内所有推文的总分 给出了他们的long / lat ..一个典型的推文看起来像：

[30.346168930000001, -97.73518] 0 2011-08-29 04:54:22 Best vacation of my life #byfar

我以前做过这个，但只计算句子的行，所以我做了旁注，我有一个包含单词的文件，另一个包含句子，我必须看看是否有任何单词句子并加上单词的数量和他们的情绪值，这是与单词相关的值..看起来像

Happy: 1
Sad, 5:

with open('words.txt') as f:
    sentiments = {word: int(value)
                 for word, value in
                 (line.split(",") for line in f)}

with open('sentences.txt') as f:
    for line in f:
        values = Counter(word for word in line.split() if word in sentiments)
        if not values:
            continue

但整个长期的拉特业务，我不知道如何在特定地区添加所有分数。主要是因为我对长期和宽容感到困惑。

所以，首先我试着＆＃34;近似＆＃34;对应于其时区的区域（不是真实数据）。所以东部（P1。P2.P3，P4 ，太平洋（P7，P8，P9，P10），山（P5，P6，P7，P8）< / strong>，中央（P3，P4，P5，P6） ..

所以有了这个信息：

p1 = (49.189787, -67.444574) p2 = (24.660845, -67.444574) p3 = (49.189787, -87.518395) p4 = (24.660845, -87.518395) p5 = (49.189787, -101.998892) p6 = (24.660845, -101.998892) p7 = (49.189787, -115.236428) p8 = (24.660845, -115.236428) p9 = (49.189787, -125.242264) p10 = (24.660845, -125.242264)

我确定区域为

class Region: def __init__(self, lat_tuple, long_tuple): self.lat_tuple = lat_tuple self.long_tuple = long_tuple def contains(self, lat, long): return self.lat_tuple[0] <= lat and lat < self.lat_tuple[1] and\ self.long_tuple[0] <= long and long < self.long_tuple[1] eastern = Region((24.660845, 49.189787), (-87.518395, -67.444574)) central = Region((24.660845, 49.189787), (-101.998892, -87.518395)) mountain = Region((24.660845, 49.189787), (-115.236428, -101.998892)) pacific = Region((24.660845, 49.189787), (-125.242264, -115.236428))

我想我已经完成了大部分工作，但我只是不知道怎么说如果这些推文都在。我需要帮助将特定区域的所有句子加起来。或者只是一个大纲

Answer 1

我没有完全检查你的坐标，但你似乎走在了正确的轨道上。使用你所做的，我需要做的就是解析推文文件：

scores = {'eastern':0,'central':0,'pacific':0,'mountain':0}
for line in open('tweets.txt'):
    line = line.split(" ")
    lat  = float(line[0][1:-1]) #Stripping the [ and the ,
    long = float(line[1][:-1])  #Stripping the ]
    if eastern.contains(lat,long):
         scores['eastern'] += score(line) #Assuming you have a score function
    elif central.contains(lat,long):
         scores['central'] += score(line)         
    elif mountain.contains(lat,long):
         scores['mountain'] += score(line)
    elif pacific.contains(lat,long):
         scores['pacific'] += score(line)
    else: raise ValueError("Could not locate coordinates "+line[0]+line[1])

通过将if语句包装在函数中，可以使这更加优雅：

def region(lat,long):
    #DEFINE HERE YOUR REGIONS, IN THE Function, or leave them as globals
    if eastern.contains(lat,long):  return 'eastern'
    if central.contains(lat,long):  return 'central'         
    if mountain.contains(lat,long): return 'mountain'
    if pacific.contains(lat,long):  return 'pacific'
    raise ValueError(" ".join(("could not locate coordinates",str(lat),str(long))))

比循环中的if语句消失了：

scores[region(lat,long)] += score(line)

编辑：你需要将得分定义为接受推文的函数，或上面代码中的分割线（这是一个单词列表，包括坐标）：

def score(tweet):
    total = 0
    for word in tweet:
        if word in sentiments: total += 1
    return total/(len(tweet)-2) #Subtract the coordinates from the length)

假设事先定义了全局sentiments。

根据地区添加句子分数Python非常丢失

1 个答案: