我是python的初学者,很难搞清楚代码的问题。
我在这里要做的是将文本转换为列表中的元组,然后计算列表中DT的数量。
让我们说txt文件的前三行如下所示:
The/DT Fulton/NNP County/NNP Grand/NNP Jury/NNP said/VBD Friday/NNP an/DT investigation/NN of/IN Atlanta/NNP 's/POS recent/JJ primary/JJ election/NN produced/VBD ``/`` no/DT evidence/NN ''/'' that/IN any/DT irregularities/NNS took/VBD place/NN ./.
The/DT jury/NN further/RB said/VBD in/IN term-end/JJ presentments/NNS that/IN the/DT City/NNP Executive/NNP Committee/NNP ,/, which/WDT had/VBD over-all/JJ charge/NN of/IN the/DT election/NN ,/, ``/`` deserves/VBZ the/DT praise/NN and/CC thanks/NNS of/IN the/DT City/NNP of/IN Atlanta/NNP ''/'' for/IN the/DT manner/NN in/IN which/WDT the/DT election/NN was/VBD conducted/VBN ./.
The/DT September-October/NNP term/NN jury/NN had/VBD been/VBN charged/VBN by/IN Fulton/NNP Superior/NNP Court/NNP Judge/NNP Durwood/NNP Pye/NNP to/TO investigate/VB reports/NNS of/IN possible/JJ ``/`` irregularities/NNS ''/'' in/IN the/DT hard-fought/JJ primary/NN which/WDT was/VBD won/VBN by/IN Mayor-nominate/NNP Ivan/NNP Allen/NNP Jr./NNP ./.
这在工作区中保存为“practice.txt”。
所以我的代码如下所示:
with open("practice.txt") as myfile:
for line in myfile:
cnt += 1
word = line.split()
total_word_per_line += len(word)
total_type_of_words += len(set(word))
a = [tuple(i.split('/')) for i in word]
for x in a:
DT_sum = 0
if x[1] == 'DT':
DT_sum += 1
total_DT_sum += DT_sum
print total_DT_sum
但是输出显示2为total_DT_sum,这意味着它只计算第三个列表中的DT。有关计算所有DT的任何建议吗?
所需的输出为5(上述三个句子中的DT总数)
提前致谢!
答案 0 :(得分:0)
您的错误:
for x in a:
DT_sum = 0
DT_sum
每次重置为0 ......
如果您想从头开始,最简单的方法是每行sum
count
:
with open("practice.txt") as myfile:
nb_dt = sum(line.count("/DT") for line in my_file)
结果是13,而不是你所说的5(可以手动验证)
该解决方案不考虑单词拆分。这意味着如果有的话,它也会找到/DTXXX
。
所以更复杂的代码就是这样:
with open("practice.txt") as myfile:
nb_dt = sum(1 if word.partition("/")[2]=="DT" else 0 for line in my_file for word in line.split())
每次根据/
进行分割时计数1,每行的每个单词右侧都有DT
。
答案 1 :(得分:0)
如果在计算'DT'
的数量之前需要将您的数据存储在元组列表中,您可以使用filter()
,如下所示:
my_list = []
with open('practice.txt', 'r') as f:
for line in f:
my_list.extend([tuple(i.split('/')) for i in line.split()])
res = filter(lambda i: i[1] == 'DT', my_list)
print(len(res)) # Output: 13
extend()
用于将每行构造的元组添加到my_list
filter()
将仅返回'DT'
位于第二位的项目。
<强>输出:强>
>>> res = filter(lambda i: i[1] == 'DT', my_list)
>>> res
[('The', 'DT'), ('an', 'DT'), ('no', 'DT'), ('any', 'DT'), ('The', 'DT'), ('the', 'DT'), ('the', 'DT'), ('the', 'DT'), ('the', 'DT'), ('the', 'DT'), ('the', 'DT'), ('The', 'DT'), ('the', 'DT')]
>>>
>>> len(res)
13