Question

我和他们的句子一起有很多id。我需要将此数据与单词列表进行比较。我希望以这样的方式输出我的信息：从与单词列表匹配的句子中获取ID和相应的单词。

我试图在Excel中执行这些操作，方法是对列进行文本处理，然后转置列表，然后进行条件格式设置。但这实际上是不可能的，因为一个句子有很多单词，而且句子很多。

有没有一种方法可以通过python编程

输入：

.combo-box .arrow, .combo-box .arrow-button, .combo-box .arrow-button .arrow {
    -fx-background-color: blue;
}

.combo-box-base .arrow, .combo-box-base .arrow-button, .combo-box-base .arrow-button .arrow {
    -fx-background-color: blue;
}

预期输出：

 | ID | data                 |    | List |
 |----|----------------------| .   hello
 | 1  | hello can you hear me| .   love
 | 2  | roses are red        | .   water
 | 3  | water is life        | .   roses
 | 4  | pie                  | .   pie
 | 5  | I love chicken pie   | .   chicken
 |----|----------------------| .   hear
                                   red

Answer 1

假设您有一个ID和句子sentences.csv的csv表，以及一个带有单词words.txt列表的文本文件，则可以执行以下操作：

import csv

words = set(l.strip() for l in open('words.txt'))
table = []
with open('sentences.csv') as f:
    for sid,sentence in csv.reader(f):
        table += [[word, sid] for word in sentence.split() if word in words]
csv.writer(sys.stdout).writerows(table)

这是一种表达这种情况的紧凑方法，并且在错误检查方面没有做太多事情。例如，如果csv文件中的某些行中没有2个单元格，则循环将崩溃。更简短地说，可以这样表达表解析：

 table = [[word,sid] for sid,sentence in csv.reader(open('sentences.csv'))
                     for word in sentence.split() if word in words]

都给出了预期的输出

hello,1
hear,1
roses,2
red,2
water,3
pie,4
love,5
chicken,5
pie,5

如何将列表与句子匹配并将单词列表带id-单词格式

1 个答案: