假设我有一个餐馆名称的文件,我需要搜索所述文件并找到像“意大利语”这样的特定字符串。如果我在文件中搜索字符串并打印出具有相同字符串的餐馆数量,代码将如何显示?
f = open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt", "r")
content = f.read()
f.close()
lines = content.split("\n")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
print ("There are", len(f.readlines()), "restaurants in the dataset")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if "GREEK" in line:
for l in searchlines[i:i+3]: print (l),
print
答案 0 :(得分:2)
您可以使用Counter dict计算所有单词,然后对某些单词进行查找:
from collections import Counter
from string import punctuation
f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"
with open(f_name) as f:
# sum(1 for _ in f) -> counts lines
print ("There are", sum(1 for _ in f), "restaurants in the dataset")
# reset file pointer back to the start
f.seek(0)
# get count of how many times each word appears, at most once per line
cn = Counter(word.strip(punctuation).lower() for line in f for word in set(line.split()))
print(cn["italian"]) # no keyError if missing, will be 0
我们使用set(line.split())
因此,如果某个餐馆出现两次单词,我们只计算一次。这样可以查找完全匹配,如果您还想在foo
中匹配foobar
等部分内容,那么创建一个可以有效查找多个单词的数据集会更复杂。
如果你真的只想要计算一个单词,你需要做的就是使用 sum 子行出现在一行中的次数:
f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"
with open(f_name) as f:
print ("There are", sum(1 for _ in f), "restaurants in the dataset")
f.seek(0)
sub = "italian"
count = sum(sub in line.lower() for line in f)
如果您想要完全匹配,则需要再次使用拆分逻辑或使用带有字边界的正则表达式。
答案 1 :(得分:-1)
您将文件输入为字符串 然后使用字符串的count方法 代码:
#Let the file be taken as a string in s1
print s1.count("italian")