我目前正在寻找处理和解析来自this .txt file的信息。该文件似乎是制表符分隔的。我希望将基数16值(即000000)解析为字典键和公司名称(即Xerox Corporation)作为字典值。因此,如果我在我的字典中查找密钥000001,则会将Xerox Corporation作为相应的值返回。
我尝试将.txt文件解析为csv,读取每个第n行的条目,但不幸的是没有模式,第n个数字也不同。
有没有办法捕获术语“base 16”之前的值,然后是后面的术语来创建一个字典条目?
非常感谢
答案 0 :(得分:1)
result = dict()
for lig in open('oui.txt'):
if 'base 16' in lig:
num, sep, txt = lig.strip().partition('(base 16)')
result.[num.strip()] = txt.strip()
答案 1 :(得分:1)
井条目分为两个换行符。第二行总是base16 one。第一个选项卡之前的数据是base16键,最后一个是公司名称。
import urllib
inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt")
data = inputfile.read()
entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries
d = {}
for entry in entries:
parts = entry.split("\n")[1].split("\t")
company_id = parts[0].split()[0]
company_name = parts[-1]
d[company_id] = company_name
部分结果:
40F52E: Leica Microsystems (Schweiz) AG
3831AC: WEG
00B0F0: CALY NETWORKS
9CC077: PrintCounts, LLC
000099: MTX, INC.
000098: CROSSCOMM CORPORATION
000095: SONY TEKTRONIX CORP.
000094: ASANTE TECHNOLOGIES
000097: EMC Corporation
000096: MARCONI ELECTRONICS LTD.
000091: ANRITSU CORPORATION
000090: MICROCOM
000093: PROTEON INC.
000092: COGENT DATA TECHNOLOGIES
002192: Baoding Galaxy Electronic Technology Co.,Ltd
90004E: Hon Hai Precision Ind. Co.,Ltd.
002193: Videofon MV
00A0D4: RADIOLAN, INC.
E0F379: Vaddio
002190: Goliath Solutions
答案 2 :(得分:1)
def oui_parse(fn='oui.txt'):
with open(fn) as ouif:
content = ouif.read()
for block in content.split('\n\n'):
lines = block.split('\n')
if not lines or not '(hex)' in lines[0]: # First block
continue
assert '(base 16)' in lines[1]
d = {}
d['oui'] = lines[1].split()[0]
d['company'] = lines[1].split('\t')[-1]
if len(lines) == 6:
d['division'] = lines[2].strip()
d['street'] = lines[-3].strip()
d['city'] = lines[-2].strip()
d['country'] = lines[-1].strip()
yield d
oui_info = list(oui_parse())
答案 3 :(得分:1)
>>> import urllib
...
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt')
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s])
... print d['000001']
XEROX CORPORATION