将包含词典列表的列分隔为新数据框,并保留旧索引

时间:2019-09-16 19:27:15

标签: python list dataframe dictionary

我有一个数据框,应用了一些过滤器后,看起来像这样:

index  A   ....  J  
55     7   .... [{'sqlStatement': 'DELETE FROM Z WHERE D=2000', 'number': 200, 'time':3556, 'timestamp': 'Jun 13, 2017 5:41:22 PM' }, {'sqlStatement': 'DELETE FROM U WHERE Z=100', 'number': 450, 'time':8906, 'timestamp': 'Jun 13, 2017 5:49:22 PM'}, {'sqlStatement': 'DELETE FROM U WHERE Z=150', 'number': 270, 'time':9806, 'timestamp': 'Jun 13, 2017 5:58:45 PM'}]
193    7   .... [{'sqlStatement': 'DELETE FROM T WHERE F=98', 'number': 8043, 'time':463465, 'timestamp': 'Jun 13, 2017 6:01:22 PM' }, {'sqlStatement': 'DELETE FROM F WHERE A=98 AND Z=100 ', 'number': 9890, 'time':487569, 'timestamp': 'Jun 13, 2017 6:09:28 PM'}]

我需要将J列分隔为一个新的数据框。为此,我使用以下代码:

for i, (k, v) in enumerate (df['J'].items()):
    df = pd.DataFrame(v)

我得到:

index  sqlStatement                       number  time    timestamp
1     DELETE FROM Z WHERE D=2000          200     3556    Jun 13, 2017 5:41:22 PM
2     DELETE FROM U WHERE Z=100           450     8906    Jun 13, 2017 5:41:22 PM
3     DELETE FROM U WHERE Z=150           270     9806    Jun 13, 2017 5:58:45 PM
4     DELETE FROM T WHERE F=98            8043    463465  Jun 13, 2017 6:01:22 PM
5     DELETE FROM T WHERE F=98 AND Z=100  9890    487569  Jun 13, 2017 6:09:28 PM

问题是我想添加一列,其中包含生成这些新值的观测值的索引。 我想实现的是:

  index   sqlStatement                        number   time    timestamp               old_index
    1     DELETE FROM Z WHERE D=2000           200     3556    Jun 13, 2017 5:41:22 PM  55
    2     DELETE FROM U WHERE Z=100            450     8906    Jun 13, 2017 5:41:22 PM  55
    3     DELETE FROM U WHERE Z=150            270     9806    Jun 13, 2017 5:58:45 PM  55
    4     DELETE FROM T WHERE F=98             8043    463465  Jun 13, 2017 6:01:22 PM  193
    5     DELETE FROM T WHERE F=98 AND Z=100   9890    487569  Jun 13, 2017 6:09:28 PM  193

你能帮我吗?

1 个答案:

答案 0 :(得分:1)

没有循环plz:

数据:

j = [[{'number': 200,
       'sqlStatement': 'DELETE FROM Z WHERE D=2000',
       'time': 3556,
       'timestamp': 'Jun 13, 2017 5:41:22 PM'},
      {'number': 450,
       'sqlStatement': 'DELETE FROM U WHERE Z=100',
       'time': 8906,
       'timestamp': 'Jun 13, 2017 5:49:22 PM'},
      {'number': 270,
       'sqlStatement': 'DELETE FROM U WHERE Z=150',
       'time': 9806,
       'timestamp': 'Jun 13, 2017 5:58:45 PM'}],
     [{'number': 8043,
       'sqlStatement': 'DELETE FROM T WHERE F=98',
       'time': 463465,
       'timestamp': 'Jun 13, 2017 6:01:22 PM'},
      {'number': 9890,
       'sqlStatement': 'DELETE FROM F WHERE A=98 AND Z=100 ',
       'time': 487569,
       'timestamp': 'Jun 13, 2017 6:09:28 PM'}]]

代码:

df = pd.DataFrame({'J': j})

enter image description here

将列表分解为单独的行:

df_explode = df.explode('J')

enter image description here

pd.Series展开dicts

df_explode = df_explode.J.apply(pd.Series)
df_explode.reset_index(inplace=True)
df_explode.rename(columns={'index': 'old_index'})

enter image description here

  • 原始索引始终保持不变。
  • 放弃reset_indexdf_explode.index将是原始索引。