将文本文件转换为df

时间:2020-02-11 11:42:02

标签: python python-3.x pandas dataframe

有一个带有值的txt文件

google.com('216.58.200.142', 443)
        commonName: *.google.com
        issuer: GTS CA 1O1
        notBefore: 2020-01-21 08:16:06
        notAfter:  2020-04-14 08:16:06

youtube.com('172.217.167.142', 443)
        commonName: *.google.com
        issuer: GTS CA 1O1
        notBefore: 2020-01-21 08:16:06
        notAfter:  2020-04-14 08:16:06

如何将此txt文件转换为df

Name                                      commonName       issuer           notBefore                 notAfter           

google.com  ('216.58.200.142', 443)      *.google.com       GTS CA 101      2020-01-21 08:16:06      2020-04-14 08:16:06
youtube.com ('172.217.167.142', 443)     *.google.com       GTS CA 101      2020-01-21 08:16:06      2020-04-14 08:16:06

2 个答案:

答案 0 :(得分:0)

data = '''
    google.com('216.58.200.142', 443)
    commonName: *.google.com
    issuer: GTS CA 1O1
    notBefore: 2020-01-21 08:16:06
    notAfter:  2020-04-14 08:16:06

    youtube.com('172.217.167.142', 443)
    commonName: *.google.com
    issuer: GTS CA 1O1
    notBefore: 2020-01-21 08:16:06
    notAfter:  2020-04-14 08:16:06
      '''


 (pd.read_csv(StringIO(data),
              header=None,
     #use a delimiter not present in the text file
     #forces pandas to read data into one column
              sep="/",
              names=['string'])
     #limit number of splits to 1
  .string.str.split(':',n=1,expand=True)
  .rename({0:'Name',1:'temp'},axis=1)
  .assign(temp = lambda x: np.where(x.Name.str.strip()
                             #look for string that ends 
                             #with a bracket
                              .str.match(r'(.*[)]$)'),
                              x.Name,
                              x.temp),
          Name = lambda x: x.Name.str.replace(r'(.*[)]$)','Name')
          )
   #remove whitespace
 .assign(Name = lambda x: x.Name.str.strip())
 .pivot(columns='Name',values='temp')
 .ffill()
 .dropna(how='any')
 .reset_index(drop=True)
 .rename_axis(None,axis=1)
 .filter(['Name','commonName','issuer','notBefore','notAfter'])      
  )


     Name                             commonName         issuer         notBefore            notAfter
0   google.com('216.58.200.142', 443)   *.google.com    GTS CA 1O1  2020-01-21 08:16:06 2020-04-14 08:16:06
1   youtube.com('172.217.167.142', 443) *.google.com    GTS CA 1O1  2020-01-21 08:16:06 2020-04-14 08:16:06

答案 1 :(得分:0)

下方
(“ input.txt”看起来像您的输入)

import pandas as pd
import copy

data = []
with open('input.txt') as f:
  lines = [l.strip() for l in f.readlines()]
  entry = {}
  for idx,line in enumerate(lines):
    if not line:
      data.append(copy.copy(entry))
      entry = {}
    elif ':' not in line:
      entry['Name'] = line
    else:
       parts = line.split(':')
       entry[parts[0]] = parts[1]
  data.append(copy.copy(entry))
df = pd.DataFrame(data)
print(df.head())

输出

Name     commonName       issuer       notBefore         notAfter
google.com('216.58.200.142', 443)   *.google.com   GTS CA 1O1   2020-01-21 08    2020-04-14 081  
youtube.com('172.217.167.142', 443)   *.google.com   GTS CA 1O1   2020-01-21 08    2020-04-14 08