将.txt文件处理成字典(Python v2.7)

时间:2011-11-09 16:29:03

标签: python parsing dictionary

我目前正在寻找处理和解析来自this .txt file的信息。该文件似乎是制表符分隔的。我希望将基数16值(即000000)解析为字典键和公司名称(即Xerox Corporation)作为字典值。因此,如果我在我的字典中查找密钥000001,则会将Xerox Corporation作为相应的值返回。

我尝试将.txt文件解析为csv,读取每个第n行的条目,但不幸的是没有模式,第n个数字也不同。

有没有办法捕获术语“base 16”之前的值,然后是后面的术语来创建一个字典条目?

非常感谢

4 个答案:

答案 0 :(得分:1)

result = dict()
for lig in open('oui.txt'):
    if 'base 16' in lig:
        num, sep, txt = lig.strip().partition('(base 16)')
        result.[num.strip()] = txt.strip()

答案 1 :(得分:1)

井条目分为两个换行符。第二行总是base16 one。第一个选项卡之前的数据是base16键,最后一个是公司名称。

import urllib

inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt")
data = inputfile.read()

entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries

d = {}
for entry in entries:
    parts = entry.split("\n")[1].split("\t")
    company_id = parts[0].split()[0]
    company_name = parts[-1]
    d[company_id] = company_name

部分结果:

40F52E: Leica Microsystems (Schweiz) AG
3831AC: WEG
00B0F0: CALY NETWORKS
9CC077: PrintCounts, LLC
000099: MTX, INC.
000098: CROSSCOMM CORPORATION
000095: SONY TEKTRONIX CORP.
000094: ASANTE TECHNOLOGIES
000097: EMC Corporation
000096: MARCONI ELECTRONICS LTD.
000091: ANRITSU CORPORATION
000090: MICROCOM
000093: PROTEON INC.
000092: COGENT DATA TECHNOLOGIES
002192: Baoding Galaxy Electronic Technology  Co.,Ltd
90004E: Hon Hai Precision Ind. Co.,Ltd.
002193: Videofon MV
00A0D4: RADIOLAN,  INC.
E0F379: Vaddio
002190: Goliath Solutions

答案 2 :(得分:1)

def oui_parse(fn='oui.txt'):
    with open(fn) as ouif:
        content = ouif.read()
    for block in content.split('\n\n'):
        lines = block.split('\n')

        if not lines or not '(hex)' in lines[0]: # First block
            continue

        assert '(base 16)' in lines[1]
        d = {}
            d['oui'] = lines[1].split()[0]
        d['company'] = lines[1].split('\t')[-1]
        if len(lines) == 6:
            d['division'] = lines[2].strip()
        d['street'] = lines[-3].strip()
        d['city'] = lines[-2].strip()
        d['country'] = lines[-1].strip()
        yield d

oui_info = list(oui_parse())

答案 3 :(得分:1)

>>> import urllib
... 
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt')
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s])
... print d['000001']
XEROX CORPORATION