过滤引号内的df值

时间:2018-11-19 16:14:08

标签: python python-3.x pandas dataframe lambda

我正在使用以下代码从命令行结果生成df:-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines  = list(filter(None, df_output_lines))

并将其转换为数据帧:-

df=pd.DataFrame(df_output_lines)
df

数据采用以下格式:-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc

enter image description here

我想以某种方式对其进行过滤,以使值before :将成为列名,而quotes " "中的值将成为该值,并且所有列都一样。输出应该像:- enter image description here

截至目前,我正在努力地做到这一点:-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

然后

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(\d+)', expand=True).astype(int)

任何建议使用lambda表达或任何一种衬里都可以做到这一点。

我的原始日志输出如下:-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

3 个答案:

答案 0 :(得分:0)

将字典列表发送到const data = { labels: ['Group1', 'Group2'], datasets: [ { label: 'label1', fillColor: 'rgba(20,220,220,0.5)', strokeColor: 'rgba(220,20,220,0.8)', highlightFill: 'rgba(220,220,22,0.75)', highlightStroke: 'rgba(220,220,220,1)', data: [60, 30], }, { label: 'label2', fillColor: 'rgba(11,17,205,0.5)', strokeColor: 'rgba(151,18,05,0.8)', highlightFill: 'rgba(51,87,25,0.75)', highlightStroke: 'rgba(190,148,7,1)', data: [28, 50], }, ], }; const options = { legend: { display: false, }, tooltips: { enabled: true, mode: 'single', callbacks: { label: (tooltipItems, data) => { console.log(tooltipItems); return `${tooltipItems.yLabel}€`; }, }, }, };

pd.DataFrame构造函数直接接受字典列表。您可以在列表理解中使用pd.DataFramestr.rstrip

str.split

目前尚不清楚您使用哪种逻辑来确定仅res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values]) print(res) id instance time 0 3214039276626790405 (null) 08:59:38.000 1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000 2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000 字符串被括号括起来。

答案 1 :(得分:0)

尽管答案已经产生,但是想添加一个基于正则表达式的方法来实现相同的目的:

date_format:G:i

只需在DataFrame中应用>>> abc time instance id 0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405" 1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405" 2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"

regex=True

正则表达式说明:

  
      
  • 第一个替代'instance:'instance:匹配字符'instance:字面上(区分大小写)

  •   
  • 第二个替代id:id:匹配字符id:从字面上(区分大小写)

  •   
  • 第3个替代时间:时间:与字符时间匹配:字面意义(区分大小写)

  •   
  • 第4个替代字符\“与字符“从字面上匹配(区分大小写)

  •   
  • 第5个替代项[()]'匹配[[)]下列表中存在的单个字符   ()匹配列表()中的单个字符(区分大小写)

  •   

答案 2 :(得分:0)

输入以下示例:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

这是从您的os.popen命令来的,然后我们过滤出空白行并尝试shlex.split行,以便保留引用项目中的空格(但引用本身被删除),例如:

import os
import shlex
import pandas as pd

rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

这将为您提供rows[0],例如:

['time:11:22:20.000',
 'instance:(null)',
 'id:723927731576482920',
 'channel:sip:confctl.com',
 'type:control',
 'elapsedtime:0.000631',
 'level:info',
 'operation:Init',
 'message:Initialize (version 4.9.0002.30618) ... ']

然后您将:上的内容进行分区,以将标识符与值分开,并将其输入到pd.DataFrame中,例如:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

为您提供df

            channel elapsedtime                  id               instance  level                                            message operation          time     type
0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control
1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control
2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control
3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control
4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control