将文本拆分为关联表时无法显示文本列

时间:2019-04-09 08:07:14

标签: python regex pandas text

这是我的数据集(仅一列)

Apr 1 09:14:55 i have apple
Apr 2 08:10:10 i have mango

有我需要的结果

month  date      time       message
Apr    1     09:14:55  i have apple
Apr    2     09:10:10  i have mango

这就是我所做的

import pandas as pd

month = []
date = []
time = []
message = []

for line in dns_data:
   month.append(line.split()[0])
   date.append(line.split()[1])
   time.append(line.split()[2])

df = pd.DataFrame(data={'month': month, 'date':date, 'time':time})

这是我得到的输出

    month     date      time
0     Apr     1     09:14:55
1     Apr     2     09:10:10

如何显示message列?

3 个答案:

答案 0 :(得分:2)

Series.str.split中的参数n用于前三个空格的分割,expand=True用于输出DataFrame

print (df)
                           col
0  Apr 1 09:14:55 i have apple
1  Apr 2 08:10:10 i have mango

df1 = df['col'].str.split(n=3, expand=True)
df1.columns=['month','date','time','message']
print (df1)
  month date      time       message
0   Apr    1  09:14:55  i have apple
1   Apr    2  08:10:10  i have mango

具有列表理解功能的另一种解决方案:

c = ['month','date','time','message']
df1 = pd.DataFrame([x.split(maxsplit=3) for x in df['col']], columns=c)
print (df1)
  month date      time       message
0   Apr    1  09:14:55  i have apple
1   Apr    2  08:10:10  i have mango

答案 1 :(得分:2)

您可以将Series.str.extractall与正则表达式一起使用:

df = pd.DataFrame({'text': {0: 'Apr 1 09:14:55 i have apple', 1: 'Apr 2 08:10:10 i have mango'}})
df_new = (df.text.str
          .extractall(r'^(?P<month>\w{3})\s?(?P<date>\d{1,2})\s?(?P<time>\d{2}:\d{2}:\d{2})\s?(?P<message>.*)$')
          .reset_index(drop=True))
print(df_new)

  month date      time       message
0   Apr   1  09:14:55  i have apple
1   Apr   2  08:10:10  i have mango

答案 2 :(得分:0)

这可能会对您有所帮助。

(?<Month>\w+)\s(?<Date>\d+)\s(?<Time>[\w:]+)\s(?<Message>.*)

Match 1
Month   Apr
Date    1
Time    09:14:55
Message i have apple
Match 2
Month   Apr
Date    2
Time    08:10:10
Message i have mango

https://rubular.com/r/1S4BcbDxPtlVxE