Question

我有这个python程序，它通过wordlist文件读取并检查结尾的后缀，这些后缀是使用endswith（）方法在另一个文件中给出的。要检查的后缀保存在列表中：suffixList [] 使用suffixCount []

进行计数

以下是我的代码：

fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
   for wordp in range(0,x):
        if word.endswith(suffixList[wordp]):
           suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
     print  "%-6s %10i"%(prefixList[output], prefixCount[output])

fd.close()

输出是这样的：

Suffixes: 
able            0
ible            0
ation           0

程序无法到达此循环：

if word.endswith(suffixList[wordp]):

Answer 1

您需要删除换行符：

word = ln.rstrip().lower()

单词来自文件，因此每行以换行符结尾。然后您尝试使用始终失败的endswith，因为没有后缀以换行符结尾。

我还会更改函数以返回所需的值：

def store_roots(start, end):
    with open("rootsPrefixesSuffixes.txt") as fs:
        lst = [line.split()[0] for line in map(str.strip, fs)
                       if '#' not in line and line]
        return lst, dict.fromkeys(lst[start:end], 0)

lst, sfx_dict = store_roots(22, 30) # List, SuffixList

然后从末尾切片，看看子串是否在dict中：

with open('longWordList.txt') as fd:
    print('Suffixes: ')
    mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
    for ln in map(str.rstrip, fd):
        suf = ln[-mx:]
        for i in range(mx-1, mn-1, -1):
            if suf in sfx_dict:
                sfx_dict[suf] += 1
            suf = suf[-i:]
    for k,v in sfx_dict:
        print("Suffix = {} Count =  {}".format(k,v))

以递增方式切换字符串的结尾应该比检查每个字符串更快，特别是如果您有许多相同长度的后缀。它最多只进行mx - mn次迭代，所以如果你有20个四个字符后缀，你只需要检查一次dict，一次只能匹配一个n长度的子字符串，所以我们会杀死{{} 1}}长度子串一次用一个切片和查找。

Answer 2

您可以使用计数器来计算后缀的出现次数：

from collections import Counter

with open("rootsPrefixesSuffixes.txt") as fp:
    List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30]  # ?

with open('longWordList.txt') as fp:
    c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)

注意：如果每行要忽略多个单词，请添加.split()[0]，否则这是不必要的。

计算出现在word文件中的后缀

2 个答案: