Question

我正在尝试从文件中读取并匹配某些字符串组合。 PFB我的计划：

def negative_verbs_features(filename):

    # Open and read the file content
    file = open (filename, "r")
    text = file.read()
    file.close()

    # Create a list of negative verbs from the MPQA lexicon
    file_negative_mpqa = open("../data/PolarLexicons/negative_mpqa.txt", "r")
    negative_verbs = []
    for line in file_negative_mpqa:
        #print line,
        pos, word = line.split(",")
        #print line.split(",")      
        if pos == "verb":
            negative_verbs.append(word)
    return negative_verbs

if __name__ == "__main__":
    print negative_verbs_features("../data/test.txt")

negative_mpqa.txt 文件由以逗号（，）分隔的字，词性标记对组成。这是文件的片段：

abandoned,adj
abandonment,noun
abandon,verb
abasement,anypos
abase,verb
abash,verb
abate,verb
abdicate,verb
aberration,adj
aberration,noun

我想创建一个文件中包含动词的所有单词的列表，因为它是词性。但是，当我运行我的程序并返回列表时（ negative_verbs ）始终为空。 if循环未执行。我通过取消注释行 print line.split（“，”） PFB的输出片段来尝试打印单词，pos对。

['wrongful', 'adj\r\n']
['wrongly', 'anypos\r\n']
['wrought', 'adj\r\n']
['wrought', 'noun\r\n']
['yawn', 'noun\r\n']
['yawn', 'verb\r\n']
['yelp', 'verb\r\n']
['zealot', 'noun\r\n']
['zealous', 'adj\r\n']
['zealously', 'anypos\r\n']

据我所知，我的文件可能会有一些特殊字符，如换行符和每行末尾的返回Feed。我只是想忽略它们并构建我的列表。请告诉我如何继续。

PS：我是Python的新手。

Answer 1

你说文件有这样的行：abandoned,adj所以那些是word, pos对。但你写了pos, word = line.split(",")，这意味着pos == 'abandoned'和word == 'adj' ...我认为清楚为什么列表现在是空的： - ）

Answer 2

将行pos, word = line.split(",")替换为

word, pos = line.rstrip().split(",")

rstrip（）删除字符串右侧的白色字符（空格，换行符，回车符...）。请注意，lstrip（）和even（）也存在。你也换了字和pos！

当您将其附加到列表中时，您也可以在word变量上使用rstrip（）。

Python中的字符串匹配问题

2 个答案: