有一个带有值的txt文件
google.com('216.58.200.142', 443)
commonName: *.google.com
issuer: GTS CA 1O1
notBefore: 2020-01-21 08:16:06
notAfter: 2020-04-14 08:16:06
youtube.com('172.217.167.142', 443)
commonName: *.google.com
issuer: GTS CA 1O1
notBefore: 2020-01-21 08:16:06
notAfter: 2020-04-14 08:16:06
如何将此txt文件转换为df
Name commonName issuer notBefore notAfter
google.com ('216.58.200.142', 443) *.google.com GTS CA 101 2020-01-21 08:16:06 2020-04-14 08:16:06
youtube.com ('172.217.167.142', 443) *.google.com GTS CA 101 2020-01-21 08:16:06 2020-04-14 08:16:06
答案 0 :(得分:0)
data = '''
google.com('216.58.200.142', 443)
commonName: *.google.com
issuer: GTS CA 1O1
notBefore: 2020-01-21 08:16:06
notAfter: 2020-04-14 08:16:06
youtube.com('172.217.167.142', 443)
commonName: *.google.com
issuer: GTS CA 1O1
notBefore: 2020-01-21 08:16:06
notAfter: 2020-04-14 08:16:06
'''
(pd.read_csv(StringIO(data),
header=None,
#use a delimiter not present in the text file
#forces pandas to read data into one column
sep="/",
names=['string'])
#limit number of splits to 1
.string.str.split(':',n=1,expand=True)
.rename({0:'Name',1:'temp'},axis=1)
.assign(temp = lambda x: np.where(x.Name.str.strip()
#look for string that ends
#with a bracket
.str.match(r'(.*[)]$)'),
x.Name,
x.temp),
Name = lambda x: x.Name.str.replace(r'(.*[)]$)','Name')
)
#remove whitespace
.assign(Name = lambda x: x.Name.str.strip())
.pivot(columns='Name',values='temp')
.ffill()
.dropna(how='any')
.reset_index(drop=True)
.rename_axis(None,axis=1)
.filter(['Name','commonName','issuer','notBefore','notAfter'])
)
Name commonName issuer notBefore notAfter
0 google.com('216.58.200.142', 443) *.google.com GTS CA 1O1 2020-01-21 08:16:06 2020-04-14 08:16:06
1 youtube.com('172.217.167.142', 443) *.google.com GTS CA 1O1 2020-01-21 08:16:06 2020-04-14 08:16:06
答案 1 :(得分:0)
下方
(“ input.txt”看起来像您的输入)
import pandas as pd
import copy
data = []
with open('input.txt') as f:
lines = [l.strip() for l in f.readlines()]
entry = {}
for idx,line in enumerate(lines):
if not line:
data.append(copy.copy(entry))
entry = {}
elif ':' not in line:
entry['Name'] = line
else:
parts = line.split(':')
entry[parts[0]] = parts[1]
data.append(copy.copy(entry))
df = pd.DataFrame(data)
print(df.head())
输出
Name commonName issuer notBefore notAfter
google.com('216.58.200.142', 443) *.google.com GTS CA 1O1 2020-01-21 08 2020-04-14 081
youtube.com('172.217.167.142', 443) *.google.com GTS CA 1O1 2020-01-21 08 2020-04-14 08