Question

我目前正在为我的Python课程做一个项目，即推文的情绪分析。我们刚刚完成了读/写文件和使用Python的split（）和strip（）之类的东西，所以我仍然是编程的菜鸟。

该项目涉及两个文件，keywords.txt文件和tweets.txt文件，文件样本为：

tweets.txt示例：

[41.298669629999999，-81.915329330000006] 6 2011-08-28 19:02:36工作需要飞过......我很高兴看到间谍小孩4然后热爱我的生活...... ARREIC

[33.702900329999999，-117.95095704000001] 6 2011-08-28 19:03:13今天   将是我生命中最美好的一天。聘请拍照   我最好的朋友的父母50周年纪念日。 60个老人。呜。

括号中的数字是坐标，之后的数字可以忽略，然后是消息/推文。

keywords.txt的样本：

单独，1

惊讶，10

激动，10

爱，10

其中数字代表＆＃34;情感价值＆＃34;那个关键字

我应该做的是在Python中读取这两个文件，将每个消息/推文中的单词分开，然后检查每个推文中是否有任何关键字，如果关键字在推文中，则添加感情价值观。最后，打印每条推文的总情绪值，忽略不包含任何关键字的推文。

例如，样本中的第一条推文，推文中有两个关键词（兴奋和喜爱），所以总感情值为20。但是，在我的代码中，它将感性值分别打印为10,10，而不是打印出总数。我也不知道如何制作它，以便检查关键字的功能遍历每条推文。

到目前为止我的代码：

tweets = open("tweets.txt","r")
keywords = open("keywords.txt","r")

def tweetDataExtract (infile):
    line = infile.readline()
    if line == "":
        return []
    else:
        parts = line.split(" ",5)
        return parts

def keywordsDataExtract (infile):
    line = infile.readline()
    if line == "":
        return[]
    else:
        parts = line.split(",",1)
        return parts

tweetData = tweetDataExtract(tweets)
while len (tweetData) == 6:

    lat = float(tweetData[0].strip("[,"))
    long = float(tweetData[1].rstrip("]"))
    message = tweetData[5].split(" ")
    messageWords=[]
    #gets rid of all the punctuation in the strip() brackets
    for element in message:
        element = element.strip("!@.,?[]{}#-_-:)('=/%;&*+|<>`~\n")
        messageWords.append(element.lower())
    tweetData = tweetDataExtract(tweets)
    print(lat, long, messageWords)

    keywordsData = keywordsDataExtract(keywords)
    while len (keywordsData) == 2:

        words = keywordsData[0]
        happiness = int(keywordsData[1])
        keywordsData = keywordsDataExtract(keywords)

        count = 0
        sentiment = 0
        if words in messageWords:
            sentiment+=happiness
            count+=1
            print (lat, long, count, sentiment)




tweets.close()
keywords.close()

如何修复代码？

PS我不知道代码的哪一部分对于发布是必不可少的，所以我到目前为止只发布了整个内容。

Answer 1

问题在于您已在count循环内部初始化变量sentiment和while。我希望你意识到它的后果!!

更正后的代码：

tweets = open("tweets.txt","r")
keywords = open("keywords.txt","r")

def tweetDataExtract (infile):
    line = infile.readline()
    if line == "\n":
        # print("hello")
        return tweetDataExtract(infile)

    else:
        parts = line.split(" ",5)
        return parts

keywordsData = [line.split(',') for line in keywords]

tweetData = tweetDataExtract(tweets)
while len(tweetData) == 6:
    lat = float(tweetData[0].strip("[,"))
    long = float(tweetData[1].rstrip("]"))
    message = tweetData[5].split(" ")
    messageWords=[]
    #gets rid of all the punctuation in the strip() brackets
    for element in message:
        element = element.strip("!@.,?[]{}#-_-:)('=/%;&*+|<>`~\n")
        messageWords.append(element.lower())
    tweetData = tweetDataExtract(tweets)
    count = 0
    sentiment = 0
    for i in range(0,len (keywordsData)):
        words = keywordsData[i][0]
        happiness = int(keywordsData[i][1].strip())
        if words in messageWords:
            sentiment+=happiness
            count+=1
    print (lat, long, count, sentiment)

tweets.close()
keywords.close()

请参阅此新代码（更短和pythonic）：

import string

dic = {}
tweets = []
with open("tweets.txt",'r') as f:
    tweets = [line.strip() for line in f if line.strip() != ''] 

with open("keywords.txt",'r') as f:
    dic = {line.strip().split(',')[0]:line.strip().split(',')[1] for line in f if line.strip()!=''}

for t in tweets:
    t = t.split(" ",5)
    lat = float(t[0].strip("[,"))
    lon = float(t[1].rstrip("]"))
    sentiment = 0
    for word in t[5].translate(str.maketrans("","", string.punctuation)).lower().split():
        if word in dic:
            sentiment+=int(dic[word])
    print(lat,lon,sentiment)

输出：

41.29866963 -81.91532933 20
33.70290033 -117.95095704 0

将一个文件中的关键字与包含不同推文的另一个文件进行比较

1 个答案: