有没有办法使用Pandas将行值拆分为单独的列?

时间:2019-07-26 18:54:04

标签: python pandas

我目前有一个熊猫数据框,其内容如下:

0   (dev_id='A', accon_time='B', start_time='C',end_time='D')
1   (dev_id='E', accon_time='F', start_time='G',end_time='H')
2   (dev_id='I', accon_time='J', start_time='K',end_time='L')

当我实际希望它是(574,4)时,此数据帧的当前形状是(574,1),其中每行中的4个逗号分隔值实际上被划分为4个单独的列。

有什么办法吗?

  • 此数据来自SQL Alchemy查询

我尝试先将查询转换为pandas系列,然后使用Series.str.split,但是结果与原始数据框相同。

ser = pd.Series(qry)
ser.str.rsplit(pat=",", n=4, expand=True)
print(ser)
df = pd.DataFrame(data=ser)
print(df)

这是我用来查询数据的内容:

class Trip(Base):
    __tablename__ = 'trip'
    dev_id = Column(String(50), primary_key=True)
    accon_time = Column(Integer)
    start_time = Column(Integer)
    end_time = Column(Integer)

    def __repr__(self):
        return "(dev_id='%s', accon_time='%s', start_time='%s',end_time='%s')" 
          % (self.dev_id, self.accon_time, self.start_time, self.end_time)

qry = session.query(Trip).\
        filter(Trip.accon_time.between(20190620000000, 20190621000000)).\
        filter(Trip.start_time <= 20190620145813).\
        filter(Trip.end_time <= 20190620151400).\
        filter(Trip.end_time >= 20190620145600)

这将返回如下列表:

  

(dev_id ='A',accon_time ='B',start_time ='C',end_time ='D'),(dev_id ='E',   accon_time ='F',start_time ='G',end_time ='H'),(dev_id ='I',   accon_time ='J',start_time ='K',end_time ='L')

将查询结果转换为熊猫数据框

df = pd.DataFrame(data=qry)
print(df)

1 个答案:

答案 0 :(得分:0)

在您的解析示例ser.str.rsplit(pat=",", n=4, expand=True)中返回ser的输出,您需要捕获输出,否则它什么也不会做

尝试以下方法进行解析:

qry =   ["(dev_id='A', accon_time='B', start_time='C',end_time='D')",
"(dev_id='E', accon_time='F', start_time='G',end_time='H')",
"(dev_id='I', accon_time='J', start_time='K',end_time='L')"]
ser = pd.Series(qry)
df = ser.apply(lambda x: pd.Series([val.split('=')[1] for val in x[1:-1].split(',')]))
df.columns = ['dev_id', 'accon_time', 'start_time', 'end_time']

对于服务行.appy()的每一行,我将字符串取走并删除括号x[1:-1],然后以逗号分隔.split(','),这将为我提供键值文字的列表(即{{ 1}})。然后,对于每个文字,我将其分割为'='并返回第二个元素,即实际值["dev_id='A'", " accon_time='B'", " start_time='C'", "end_time='D'"]

如果您不希望元素中的“ .split('=')[1]”在末尾用'删除

.strip('\'')

输出:

   ser = ser.apply(lambda x:[val.split('=')[1].strip('\'') for val in x[1:-1].split(',')])