我目前有一个熊猫数据框,其内容如下:
0 (dev_id='A', accon_time='B', start_time='C',end_time='D')
1 (dev_id='E', accon_time='F', start_time='G',end_time='H')
2 (dev_id='I', accon_time='J', start_time='K',end_time='L')
当我实际希望它是(574,4)时,此数据帧的当前形状是(574,1),其中每行中的4个逗号分隔值实际上被划分为4个单独的列。
有什么办法吗?
我尝试先将查询转换为pandas系列,然后使用Series.str.split,但是结果与原始数据框相同。
ser = pd.Series(qry)
ser.str.rsplit(pat=",", n=4, expand=True)
print(ser)
df = pd.DataFrame(data=ser)
print(df)
这是我用来查询数据的内容:
class Trip(Base):
__tablename__ = 'trip'
dev_id = Column(String(50), primary_key=True)
accon_time = Column(Integer)
start_time = Column(Integer)
end_time = Column(Integer)
def __repr__(self):
return "(dev_id='%s', accon_time='%s', start_time='%s',end_time='%s')"
% (self.dev_id, self.accon_time, self.start_time, self.end_time)
qry = session.query(Trip).\
filter(Trip.accon_time.between(20190620000000, 20190621000000)).\
filter(Trip.start_time <= 20190620145813).\
filter(Trip.end_time <= 20190620151400).\
filter(Trip.end_time >= 20190620145600)
这将返回如下列表:
(dev_id ='A',accon_time ='B',start_time ='C',end_time ='D'),(dev_id ='E', accon_time ='F',start_time ='G',end_time ='H'),(dev_id ='I', accon_time ='J',start_time ='K',end_time ='L')
将查询结果转换为熊猫数据框
df = pd.DataFrame(data=qry)
print(df)
答案 0 :(得分:0)
在您的解析示例ser.str.rsplit(pat=",", n=4, expand=True)
中返回ser的输出,您需要捕获输出,否则它什么也不会做
尝试以下方法进行解析:
qry = ["(dev_id='A', accon_time='B', start_time='C',end_time='D')",
"(dev_id='E', accon_time='F', start_time='G',end_time='H')",
"(dev_id='I', accon_time='J', start_time='K',end_time='L')"]
ser = pd.Series(qry)
df = ser.apply(lambda x: pd.Series([val.split('=')[1] for val in x[1:-1].split(',')]))
df.columns = ['dev_id', 'accon_time', 'start_time', 'end_time']
对于服务行.appy()
的每一行,我将字符串取走并删除括号x[1:-1]
,然后以逗号分隔.split(',')
,这将为我提供键值文字的列表(即{{ 1}})。然后,对于每个文字,我将其分割为'='并返回第二个元素,即实际值["dev_id='A'", " accon_time='B'", " start_time='C'", "end_time='D'"]
。
如果您不希望元素中的“ .split('=')[1]
”在末尾用'
删除
.strip('\'')
输出:
ser = ser.apply(lambda x:[val.split('=')[1].strip('\'') for val in x[1:-1].split(',')])