我想计算包含数据的文本文件中的单词,如下所示:
ROK :
ROK/(NN)
New :
New/(SV)
releases, :
releases/(NN) + ,/(SY)
week :
week/(EP)
last :
last/(JO)
compared :
compare/(VV) + -ed/(EM)
year :
year/(DT)
releases :
releases/(NN)
/(NN),/(SV)和/(EP)这样的表达式被认为是类别。 我想在每个类别之前提取单词并计算整个文本中有多少单词。
我想在一个新的文本文件中写一个结果:
(NN)
releases 2
ROK 1
(SY)
New 1
, 1
(EP)
week 1
(JO)
last 1
......
请帮帮我!
这是我的车库代码; _;它不起作用。
import os, sys
import re
wordset = {}
for line in open('E:\\mach.txt', 'r'):
if '/(' in line:
word = re.findall(r'(\w)/\(', line)
print word
if word not in wordset: wordset[word]=1
else: wordset[word]+=1
f = open('result.txt', 'w')
for word in wordset:
print>> f, word, wordset[word]
f.close()
答案 0 :(得分:1)
from __future__ import print_function
import re
REGEXP = re.compile(r'(\w+)/(\(.*?\))')
def main():
words = {}
with open('E:\\mach.txt', 'r') as fp:
for line in fp:
for item, category in REGEXP.findall(line):
words.setdefault(category, {}).setdefault(item, 0)
words[category][item] += 1
with open('result.txt', 'w') as fp:
for category, words in sorted(words.items()):
print(category, file=fp)
for word, count in words.items():
print(word, count, sep=' ', file=fp)
print(file=fp)
return 0
if __name__ == '__main__':
raise SystemExit(main())
欢迎你(= 如果您还想要计算奇怪的“-ed”或“,”,请调整regexp以匹配除空白之外的任何字符:
REGEXP = re.compile(r'([^\s]+)/(\(.*?\))')
答案 1 :(得分:0)
您正在尝试使用列表(是的单词是列表)作为索引。这是你应该做的:
import re
wordset = {}
for line in open('testdata.txt', 'r'):
if '/(' in line:
words = re.findall(r'(\w)/\(', line)
print words
for word in words:
if word not in wordset:
wordset[word]=1
else:
wordset[word]+=1
f = open('result.txt', 'w')
for word in wordset:
print>> f, word, wordset[word]
f.close()
你很幸运我想学习python,否则我不会尝试你的代码。下次发布你得到的错误!我打赌它是
TypeError:不可用类型:'list'
如果你想要好的答案,帮助我们很重要!