我有一个单词列表,其中包含数字,英语单词和孟加拉语单词,在列中,我有其频率。这些列没有标题。我需要频率在5到300之间的单词。这是我正在使用的代码。它不起作用。
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3
print(word[0])
elif word[1] <= 300
print(word[0])
这给了我一个语法错误。
File "<stdin>", line 2
if word[1] >= 3
^
SyntaxError: invalid syntax
有人可以帮忙吗?
答案 0 :(得分:2)
您应该在:
语句之后添加if
来修复此SyntaxError:
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3:
print word[0]
elif word[1] <= 300:
print word[0]
阅读本文: https://docs.python.org/2/tutorial/controlflow.html
这里还有一个有用的提示:当python为某些行提供SyntaxError时,请始终查看上一行,然后查看下一行。
答案 1 :(得分:1)
您的代码几乎没有问题,我会在一小时内添加完整的解释。看看它应该是什么样子,并在此期间咨询docs:
首先,使用with open()
子句打开文件更安全(参见https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)
filepath = 'C:/Python27/bengali_wordlist_full.txt'
with open(filepath) as f:
content = f.read().decode('string-escape').decode("utf-8")
# do you really need all of this decdcoding?
现在content
保存文件中的文本:这是一个长字符串,标有最终标记'\n'
个字符。我们可以将其拆分为行列表:
lines = content.splitlines()
然后解析一行:
for line in lines:
try:
# split line into items, assign first to 'word', second to 'freq'
word, freq = line.split('\t') # assuming you have tab as separator
freq = float(freq) # we need to convert second item to numeric value from string
if 5 <= freq <= 300: # you can 'chain' comparisons like this
print word
except ValueError:
# this happens if split() gives more than two items or float() fails
print "Could not parse this line:", line
continue