我在Python中有点生疏,我只是在寻找帮助实现一个示例函数来计算单词(这只是一个scons脚本的示例目标,它没有做任何“真实”的事情):
def countWords(target, source, env):
if (len(target) == 1 and len(source) == 1):
fin = open(str(source[0]), 'r')
# do something with "f.read()"
fin.close()
fout = open(str(target[0]), 'w')
# fout.write(something)
fout.close()
return None
你能帮我填一些细节吗?计算单词的通常方法是读取每一行,分成单词,并为行中的每个单词增加字典中的计数器;然后输出,通过减少计数对单词进行排序。
编辑:我正在使用Python 2.6(确切地说是Python 2.6.5)
答案 0 :(得分:7)
from collections import defaultdict
def countWords(target, source, env):
words = defaultdict(int)
if (len(target) == 1 and len(source) == 1):
with open(str(source[0]), 'r') as fin:
for line in fin:
for word in line.split():
words[word] += 1
with open(str(target[0]), 'w') as fout:
for word in sorted(words, key=words.__getitem__, reverse=True):
fout.write('%s\n' % word)
return None
答案 1 :(得分:1)
在不知道env
存在的原因的情况下,我只能执行以下操作:
def countWords(target, source, env):
wordCount = {}
if len(target) == 1 and len(source) == 1:
with fin as open(source[0], 'r'):
for line in f
for word in line.split():
if word in wordCount.keys():
wordCount[word] += 1
else:
wordCount[word] = 0
rev = {}
for v in wordCount.values():
rev[v] = []
for w in wordCount.keys():
rev[wordCOunt[w]].append(w)
with open(target[0], 'w') as f:
for v in rev.keys():
f.write("%d: %s\n" %(v, " ".join(rev[v])))
答案 2 :(得分:0)
有一个有用的示例here。它的工作方式与你描述的大致相同,也可以计算句子。
答案 3 :(得分:0)
效率不高但简洁!
with open(fname) as f:
res = {}
for word in f.read().split():
res[word] = res.get(word, 0)+1
with open(dest, 'w') as f:
f.write("\n".join(sorted(res, key=lambda w: -res[w])))
答案 4 :(得分:0)
这是我的版本:
import string
import itertools as it
drop = string.punctuation+string.digits
def countWords(target, source, env=''):
inputstring=open(source).read()
words = sorted(word.strip(drop)
for word in inputstring.lower().replace('--',' ').split())
wordlist = sorted([(word, len(list(occurances)))
for word, occurances in it.groupby(words, lambda x: x)],
key = lambda x: x[1],
reverse = True)
with open(target,'w') as results:
results.write('\n'.join('%16s : %s' % word for word in wordlist))