Question

我正在尝试创建一个计数器，该计数器可以从一个文件中查找需要计数的内容以及从另一个文件中计算的内容。它打开file1并查找一个城市及其人口用短划线分隔，file2显示城市名称和由短划线分隔的犯罪。当我对城市名称进行硬编码时工作正常，但是当我尝试使用if循环来查找城市名称时，它会发现犯罪报告中第一个城市出现了多少次，但之后就没有了。请帮忙

for line in file1:
    dash = line.find("-")
    variableCity = line[:dash]
    cityPop = line[dash + 1:]
    crimeCounter = 0
    for crime in file2:
        x = crime[:dash]
        if x == variableCity:
            crimeCounter += 1
    print("{} which has a population of {} has {} reported crimes".format(variableCity, cityPop, crimeCounter))

这是我的代码

文件1：

Bothell-89232
Kent-97232
Tacoma-89333
Renton-98632
Redmond-64789
Seattle-76978

file2的：

Kent-Theft
Tacoma-Break In
Seattle-Break In
Tacoma-Auto Break In
Federal Way-Auto Break In
Kent-Break In
Tacoma-Auto Break In
Federal Way-Auto Break In
Kent-Mugging
Kent-Break In
Federal Way-Break In
Renton-Break In
Renton-Auto Theft
Tacoma-Mugging
Seattle-Theft
Auburn-Auto Theft
Renton-Theft
Tacoma-Auto Theft
Kent-Mugging
Seattle-Auto Break In
Tacoma-Theft
Kent-Auto Theft
Seattle-Break In
Auburn-Mugging
Tacoma-Mugging
Auburn-Auto Theft
Auburn-Auto Theft
Seattle-Auto Theft
Federal Way-Mugging
Kent-Mugging
Renton-Auto Theft
Tacoma-Mugging
Auburn-Theft
Seattle-Auto Break In
Auburn-Mugging
Seattle-Theft
Auburn-Theft
Auburn-Auto Break In
Federal Way-Auto Break In
Seattle-Break In
Kent-Theft
Seattle-Auto Break In
Federal Way-Auto Break In
Kent-Auto Break In
Seattle-Auto Break In
Renton-Auto Break In
Kent-Auto Break In
Renton-Break In
Federal Way-Mugging
Seattle-Mugging
Renton-Mugging
Renton-Auto Break In
Tacoma-Mugging
Tacoma-Auto Theft
Seattle-Auto Break In
Kent-Auto Theft
Kent-Auto Theft
Federal Way-Mugging
Tacoma-Auto Theft
Federal Way-Theft
Tacoma-Auto Theft
Renton-Auto Theft
Seattle-Theft
Seattle-Auto Break In
Tacoma-Mugging
Tacoma-Auto Theft
Seattle-Break In
Federal Way-Theft
Seattle-Auto Break In
Auburn-Auto Break In
Auburn-Auto Break In
Tacoma-Break In
Seattle-Mugging
Renton-Theft
Auburn-Theft
Renton-Theft
Seattle-Auto Theft
Auburn-Mugging
Seattle-Break In
Kent-Mugging
Kent-Break In
Federal Way-Break In
Federal Way-Auto Theft
Auburn-Theft
Tacoma-Theft
Kent-Auto Break In
Auburn-Auto Theft
Seattle-Mugging
Kent-Theft
Kent-Mugging
Kent-Auto Break In
Seattle-Theft
Tacoma-Auto Theft
Renton-Theft
Renton-Break In
Auburn-Break In
Renton-Mugging
Renton-Mugging
Tacoma-Break In

请注意，在每个文件中，下一个城市都会显示在新行

上

Answer 1

看起来你错过了在这件作品中找到短划线的位置：

for crime in file2:
    x = crime[:dash]

不应该是：

for crime in file2:
    dash = crime.find("-")
    x = crime[:dash]

无论哪种方式，更正确的解决方案应如下：

for line in file1:
    parsed = line.split("-")
    variableCity = parsed[0]
    cityPop = parsed[1][:-1]

    file2 = open("file2.txt")
    crimeCounter = 0
    for crime in file2:
        c = crime.split("-")
        if c[0] == variableCity:
            crimeCounter += 1

    print("{} which has a population of {} has {} reported crimes".format(variableCity, cityPop, crimeCounter))

然而，更优化的解决方案应该在两次通过中完成，在第一次通过中，我们正在阅读城市信息以进行映射，而不是增加犯罪报告：

citiesPop = {}
citiesCrime = {}

for line in file1:
    parsed = line.split("-")
    city = parsed[0]
    cityPop = parsed[1][:-1]
    citiesPop[city] = cityPop
    citiesCrime[city] = 0

for crime in file2:
    city = crime.split("-")[0]
    if city in citiesCrime:
        citiesCrime[city] += 1

for city in citiesPop.keys():
    print("{} which has a population of {} has {} reported crimes".format(city, citiesPop[city], citiesCrime[city]))

Answer 2

代码似乎正确，我猜你的file1和file2有问题。您可以查看这两个变量，或显示有关如何获取file1和file2

的代码

Answer 3

如果您使用的是文件句柄

即在代码中较早的某处你有像

这样的行

file1=open('file1')
file2=open('file2')

在搜索第二次及以后的时间之前，你需要回到文件2的开头

例如

添加行

file2.seek(0)

行前

for crime in file2:

或打印计数器的行之后。

否则文件指针留在最后，如果文件，你不会从中获得任何“犯罪”。有点像在你再次播放之前必须倒带。

我相信将文件内容一次读入变量会更有效率，然后这个问题就不会发生但是我想如果它们是小文件可能没有多大区别。如果它们是巨大的文件，内存使用可能会阻止你阅读它们，但我怀疑它们是那么大。

Answer 4

让我提供一些关于如何清理代码的建议，以便我们可以看到bug的位置，以及一些常规的python调试技巧。

对于第一个文件，请考虑使用variableCity, cityPop = line.split('-')以简化解析逻辑。 Simpler logic -> less bugs是我的经验法则。像这样：

for line in file1:
    variableCity, cityPop = line.split('-')

或者您甚至可以立即将其放入自己的词典中：

city_pops = dict(line.split('-') for line in file1)

现在您甚至无需嵌套for循环！这有几个优点。最重要的是，现在您可以在交互式解释器中检查数据结构，看它是否正确。

>>> city_pops
{'Tacoma': '89333', 'Redmond': '64789', 'Kent': '97232', 'Seattle': '76978', 'Renton': '98632', 'Bothell': '89232'}

如果数据结构太大，请尝试检查几个条目。您还可以查看len(city_pops)

的条目数

伟大，分裂和征服！现在您已经完成了第一个文件，并且您知道它已被正确解析，我们可以继续进行第二个文件。

让我们再次使用破折号技术。此外，由于您正在计算，我建议使用未充分利用的内置集合Counter。

如果您只想计算所有条目，可以执行以下操作：

from collections import Counter
crime_rate = Counter(line.split('-')[0] for line in file2)

您可以再次检查解释器中的内容，以确保您处于正确的轨道上：

>>> crime_rate
Counter({'Seattle': 21, 'Tacoma': 18, 'Kent': 18, 'Auburn': 15, 'Renton': 15, 'Federal Way': 12})

现在您只需要过滤掉您不感兴趣的城市。确保每个城市名称都是之前city_pops字典中的关键字：

crime_rate = Counter(line.split('-')[0] for line in file2
                     if line.split('-')[0] in city_pops.keys())

最终结果：

>>> crime_rate
Counter({'Seattle': 21, 'Tacoma': 18, 'Kent': 18, 'Renton': 15})

故事的道德是，如果你不需要，不要套圈。它使调试变得更加困难，并且可能会增加程序的计算复杂性。同时自由使用string.split()方法和Counter类。最后，理解和生成器表达式几乎总是优于for循环。

基本上你的程序归结为2行：

city_pops = dict(line.split('-') for line in file1)
crime_rate = Counter(line.split('-')[0] for line in file2
                     if line.split('-')[0] in city_pops.keys())

Answer 5

我想我明白了我做的是补充 file2.seek（）所以新代码是

for line in file1:
    dash = line.find("-")
    variableCity = line[:dash]
    cityPop = line[dash + 1:]
    crimeCounter = 0
    file2.seek(0)
    for crime in file2:
        x = crime[:dash]
        if x == variableCity:
            crimeCounter += 1
    print("{} which has a population of {} has {} reported  crimes".format(variableCity, cityPop, crimeCounter))

我是这样做的，因为它只在我的原始代码中添加了一行代码。谢谢你的所有答案。

嵌套for循环只迭代一次

5 个答案: