Question

我的任务是解析txtfile并返回一个字典，其中包含文件中的姓氏数。 txtfile如下所示：

city: Aberdeen
state: Washington
Johnson,    Danny
Williams, Steve
Miller,    Austin
Jones, Davis
Miller,    Thomas
Johnson, Michael

我知道如何读取文件，并将文件分配给列表或字符串，但是我不知道如何找到每个文件的计数并将它们放入字典中。你们其中一个人能指出我正确的方向吗？

Answer 1

import re

with open('test.txt') as f:
    text = f.read()

reobj = re.compile("(.+),", re.MULTILINE)
dic = {}
for match in reobj.finditer(text):
    surname = match.group()
    if surname in dic:
        dic[surname] += 1
    else:
        dic[surname] = 1

结果是：

{'Williams,': 1, 'Jones,': 1, 'Miller,': 2, 'Johnson,': 2}

Answer 2

为了找到每个姓氏的计数：

你需要创建一个字典，空的会做
循环浏览文件中的行
对于文件中的每一行确定您需要对数据执行的操作，似乎有标题。也许测试字符串中特定字符的存在就足够了。
对于您决定的每一行是一个名称，您需要拆分或者分割字符串以提取姓氏。
然后使用姓氏作为字典的键，检查并设置或增加一个整数作为键的值。
在循环浏览文件数据之后，你应该有一个字典键入姓氏，值就是出现次数。

Answer 3

    import re
    file = open('data.txt','r')
    lastnames={}
    for line in file:
        if re.search(':',line) ==None:
            line.strip()
            last = line.split(',')[0].strip()
            first = line.split(',')[1].strip()
            if lastnames.has_key(last):
                lastnames[last]+= 1
            else:
                lastnames[last]= 1
    print lastnames

给我以下

>>> {'Jones': 1, 'Miller': 2, 'Williams': 1, 'Johnson': 2}

Answer 4

这将是我的方法。不需要使用正则表达式。同时过滤空白行以获得额外的稳健性。

from __future__ import with_statement
from collections import defaultdict

def nonblank_lines(f):
    for l in f:
        line = l.rstrip()
        if line:
            yield line

with open('text.txt') as text:
    lines = nonblank_lines(text)
    name_lines = (l for l in lines if not ':' in l)    

    surnames = (line.split(',')[0].strip() for line in name_lines)

    counter = defaultdict(int)
    for surname in surnames:
        counter[surname] += 1

    print counter

如果您使用的是Python版本＆gt; 2.7您可以使用内置的collections.Counter代替defaultdict。

如何解析文本文件并导出到字典？

4 个答案: