我想知道您是否可以帮助我解决python编程问题?我当前正在尝试编写一个程序,该程序读取文本文件并输出“ word 1 True”(如果该文件之前已经出现过该单词),或者输出“ word 1 False”(如果这是该单词首次出现)。
这是我想出的:
fh = open(fname)
lst = list ()
for line in fh:
words = line.split()
for word in words:
if word in words:
print("word 1 True", word)
else:
print("word 1 False", word)
但是,它仅返回“单词1 True”
请告知。
谢谢!
答案 0 :(得分:3)
一个简单(快速)的实现方法是使用python字典。可以将它们视为一个数组,但是index-key是字符串而不是数字。
这给出了一些代码片段,例如:
found_words = {} # empty dictionary
words1 = open("words1.txt","rt").read().split(' ') # TODO - handle punctuation
for word in words1:
if word in found_words:
print(word + " already in file")
else:
found_words[word] = True # could be set to anything
现在在处理单词时,只需检查字典中是否已经存在该单词即可表明它已经被看到。
答案 1 :(得分:2)
您可能还想跟踪以前的位置,例如:
with open(fname) as fh:
vocab = {}
for i, line in enumerate(fh):
words = line.split()
for j, word in enumerate(words):
if word in vocab:
locations = vocab[word]
print word "occurs at", locations
locations.append((i, j))
else:
vocab[word] = [(i, j)]
# print "First occurrence of", word
答案 2 :(得分:1)
此代码片段不使用该文件,但易于测试和研究。主要区别在于您必须像示例中那样加载文件并按行读取
example_file = """
This is a text file example
Let's see how many time example is typed.
"""
result = {}
words = example_file.split()
for word in words:
# if the word is not in the result dictionary, the default value is 0 + 1
result[word] = result.get(word, 0) + 1
for word, occurence in result.items():
print("word:%s; occurence:%s" % (word, occurence))
更新:
@khachik建议,一种更好的解决方案是使用Counter
。
>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
答案 3 :(得分:1)
按照路线,您可以执行以下操作:
with open('tyger.txt', 'r') as f:
lines = (f.read()).split()
for word in lines:
if lines.count(word) > 1:
print(f"{word}: True")
else:
print(f"{word}: Flase")
输出
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 read_true.py When: Flase the: True stars: Flase threw: Flase down: Flase their: True spears: Flase ...
您还可以数出每个单词:
with open('tyger.txt', 'r') as f:
count = {}
lines = f.read()
lines = lines.split()
for i in lines:
count[i] = lines.count(i)
print(count)
输出
{'When': 1, 'the': 2, 'stars': 1, 'threw': 1, 'down': 1, 'their': 2, 'spears': 1, 'And': 1, "water'd": 1, 'heaven': 1, 'with': 1, 'tears:': 1, 'Did': 2, 'he': 2, 'smile': 1, 'his': 1, 'work': 1, 'to': 1, 'see?': 1, 'who': 1, 'made': 1, 'Lamb': 1, 'make': 1, 'thee?': 1}
您可以像这样使用字典:
for k in count:
if count[k] > 1:
print(f"{k}: True")
else:
print(f"{k}: False")
输出
When: False the: True stars: False threw: False down: False their: True spears: False