将此txt文件包含值
google.com('172.217.163.46', 443)
commonName: *.google.com
issuer: GTS CA 1O1
notBefore: 2020-02-12 11:47:11
notAfter: 2020-05-06 11:47:11
facebook.com('31.13.79.35', 443)
commonName: *.facebook.com
issuer: DigiCert SHA2 High Assurance Server CA
notBefore: 2020-01-16 00:00:00
notAfter: 2020-04-15 12:00:00
如何将其转换为df
尝试了这个并获得了部分成功:
f = open("out.txt", "r")
a=(f.read())
a=(pd.read_csv(StringIO(data),
header=None,
#use a delimiter not present in the text file
#forces pandas to read data into one column
sep="/",
names=['string'])
#limit number of splits to 1
.string.str.split(':',n=1,expand=True)
.rename({0:'Name',1:'temp'},axis=1)
.assign(temp = lambda x: np.where(x.Name.str.strip()
#look for string that ends
#with a bracket
.str.match(r'(.*[)]$)'),
x.Name,
x.temp),
Name = lambda x: x.Name.str.replace(r'(.*[)]$)','Name')
)
#remove whitespace
.assign(Name = lambda x: x.Name.str.strip())
.pivot(columns='Name',values='temp')
.ffill()
.dropna(how='any')
.reset_index(drop=True)
.rename_axis(None,axis=1)
.filter(['Name','commonName','issuer','notBefore','notAfter'])
)
但这是循环的,给我多个数据,就像单行有多个重复
答案 0 :(得分:1)
该文件不是csv格式,因此您不应该使用read_csv
来读取它,而应该用手对其进行解析。您可以在这里做
with open("out.txt") as fd:
cols = {'commonName','issuer','notBefore','notAfter'} # columns to keep
rows = [] # list of records
for line in fd:
line = line.strip()
if ':' in line:
elt = line.split(':', 1) # data line: parse it
if elt[0] in cols:
rec[elt[0]] = elt[1]
elif len(line) > 0:
rec = {'Name': line} # initial line of a block
rows.append(rec)
a = pd.DataFrame(rows) # and build the dataframe from the list of records
它给出:
Name commonName issuer notAfter notBefore
0 google.com('172.217.163.46', 443) *.google.com GTS CA 1O1 2020-05-06 11:47:11 2020-02-12 11:47:11
1 facebook.com('31.13.79.35', 443) *.facebook.com DigiCert SHA2 High Assurance Server CA 2020-04-15 12:00:00 2020-01-16 00:00:00
答案 1 :(得分:0)
尝试一下:
# ==============
# read text file
# ==============
file = open('in.txt')
lines = file.readlines()
# ==============
# create a dict
# ==============
mydict = {}
for i in range(0,len(lines),6):
# ==============
# add "Name" to dict
# ==============
if 'Name' not in mydict:
mydict['Name']=[]
mydict['Name'].append(lines[i].strip('\n'))
# ==============
# add other cols to dict
# ==============
for line in lines[i+1:i+5]:
key,*value = line.strip().strip('\n').split(':',maxsplit=1)
if key not in mydict:
mydict[key]=[]
mydict[key].append(''.join(value).strip())
pd.DataFrame(mydict)
输出:
+----+-----------------------------------+----------------+----------------------------------------+---------------------+---------------------+
| | Name | commonName | issuer | notBefore | notAfter |
|----+-----------------------------------+----------------+----------------------------------------+---------------------+---------------------|
| 0 | google.com('172.217.163.46', 443) | *.google.com | GTS CA 1O1 | 2020-02-12 11:47:11 | 2020-05-06 11:47:11 |
| 1 | facebook.com('31.13.79.35', 443) | *.facebook.com | DigiCert SHA2 High Assurance Server CA | 2020-01-16 00:00:00 | 2020-04-15 12:00:00 |
+----+-----------------------------------+----------------+----------------------------------------+---------------------+---------------------+