Question

假设我有一个餐馆名称的文件，我需要搜索所述文件并找到像“意大利语”这样的特定字符串。如果我在文件中搜索字符串并打印出具有相同字符串的餐馆数量，代码将如何显示？

f = open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt", "r")
content = f.read()
f.close()
lines = content.split("\n")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
      print ("There are", len(f.readlines()), "restaurants in the dataset")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
        searchlines = f.readlines()
    for i, line in enumerate(searchlines):
    if "GREEK" in line: 
        for l in searchlines[i:i+3]: print (l),
        print

Answer 1

您可以使用Counter dict计算所有单词，然后对某些单词进行查找：

from collections import Counter
from string import punctuation

f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"


with open(f_name) as f:
    #  sum(1 for _ in f) -> counts lines
    print ("There are", sum(1 for _ in f), "restaurants in the dataset")
    # reset file pointer back to the start
    f.seek(0)
    # get count of how many times each word appears, at most once per line
    cn = Counter(word.strip(punctuation).lower() for line in f for word in set(line.split()))
    print(cn["italian"]) # no keyError if missing, will be 0

我们使用set(line.split())因此，如果某个餐馆出现两次单词，我们只计算一次。这样可以查找完全匹配，如果您还想在foo中匹配foobar等部分内容，那么创建一个可以有效查找多个单词的数据集会更复杂。

如果你真的只想要计算一个单词，你需要做的就是使用 sum 子行出现在一行中的次数：

f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"

with open(f_name) as f:
    print ("There are", sum(1 for _ in f), "restaurants in the dataset")
    f.seek(0)
    sub = "italian"
    count = sum(sub in line.lower() for line in f)

如果您想要完全匹配，则需要再次使用拆分逻辑或使用带有字边界的正则表达式。

Answer 2

您将文件输入为字符串然后使用字符串的count方法代码：

#Let the file be taken as a string in s1
print s1.count("italian")

在Python中搜索文件

2 个答案: