我有一个这样的文件:
2.nseasy.com.|['azeaonline.com']
ns1.iwaay.net.|['alchemistrywork.com', 'dha-evolution.biz', 'hidada.net', 'sonifer.biz']
ns2.hd28.co.uk.|['networksound.co.uk']
预期结果:
2.nseasy.com.|'azeaonline.com'
ns1.iwaay.net.|'alchemistrywork.com'
ns1.iwaay.net.|'dha-evolution.biz'
ns1.iwaay.net.|'hidada.net'
ns1.iwaay.net.|'sonifer.biz'
ns2.hd28.co.uk.|'networksound.co.uk'
当我尝试这样做时,我得到的是域名字符,而不是有价值的域名列表。这意味着字典d的值中的列表被识别为列表但被识别为字符串。这是我的代码:
d = defaultdict(list)
f = open(file,'r')
start = time()
for line in f:
NS,domain_list = line.split('|')
s = json.dumps(domain_list)
d[NS] = json.loads(s)
for NS, domains in d.items():
for domain in domains:
print (NS, domain)
当前结果的示例:
w
o
o
d
l
a
n
d
f
a
r
m
e
r
s
m
a
r
k
e
t
.
o
r
g
'
]
答案 0 :(得分:4)
你正在用json做什么是不正确的。 s = json.dumps(domain_list)
将列表转储为字符串s
。 json.loads(s)
再次读取字符串,然后将字符串放在字符串上并打印它,因此输出中的单个字符。
尝试类似:
d = defaultdict(list)
f = open(file,'r')
start = time()
for line in f:
NS,domain_list = line.split('|')
d[NS] = json.loads(domain_list.replace("'", '"'))
for NS, domains in d.items():
for domain in domains:
print (NS, domain)
答案 1 :(得分:2)
这是另一个(假设names.txt包含您的数据):
with open('names.txt') as f: # Open the file for reading
for line in f: # iterate over each line
host,parts=line.strip().split('|') # Split the parts on the |
parts=parts.replace('[','').replace(']','') # Remove the [] chars
parts_a=map(str.strip, parts.split(',')) # Split on the comma, and remove any spaces
for part in parts_a: # for the split part, iterate through each one
print '{0}|{1}'.format(host, part) # print the host and part separated by a |
注意:你也可以用parts_a = json.loads(parts)替换第4行和第5行,假设|是JSON ......
答案 2 :(得分:2)
在这种情况下你不需要使用json
,因为它无法解决你的问题,你可以在列表理解中使用ast.literal_eval
和itertools.repeat
来创建欲望对:< / p>
>>> from itertools import repeat
>>> import ast
>>> sp_l=[(i.split('|')[0],ast.literal_eval(i.split('|')[1])) for i in s.split('\n')]
>>> for k in [zip(repeat(i,len(j)),j) for i,j in sp_l]:
... for item in k:
... print '|'.join(item)
...
2.nseasy.com.|azeaonline.com
ns1.iwaay.net.|alchemistrywork.com
ns1.iwaay.net.|dha-evolution.biz
ns1.iwaay.net.|hidada.net
ns1.iwaay.net.|sonifer.biz
ns2.hd28.co.uk.|networksound.co.uk
答案 3 :(得分:2)
尝试:
import ast
with open(file, "r") as f:
d = {k: ast.literal_eval(v) for k, v in map(lambda s: s.split("|"), f)}
for NS, domains in d.items():
for domain in domains:
print "%s|'%s'" % (NS, domain)
甚至只是:
with open('file.xyz') as f:
for thing in f:
q, r = thing.split('|')
r = ast.literal_eval(r)
for other in r:
print '{}|{}'.format(q, other)
答案 4 :(得分:1)
这是一个正则表达式解决方案:
import re
input = '''2.nseasy.com.|['azeaonline.com']
ns1.iwaay.net.|['alchemistrywork.com', 'dha-evolution.biz', 'hidada.net', 'sonifer.biz']
ns2.hd28.co.uk.|['networksound.co.uk']'''
for line in input.split('\n'):
splitted = line.split('|')
left = splitted[0]
right = re.findall("'([a-z\.-]+?)'", splitted[1])
for domain in right:
print '{0}|{1}'.format(left, domain)
输出:
2.nseasy.com.|azeaonline.com
ns1.iwaay.net.|alchemistrywork.com
ns1.iwaay.net.|dha-evolution.biz
ns1.iwaay.net.|hidada.net
ns1.iwaay.net.|sonifer.biz
ns2.hd28.co.uk.|networksound.co.uk