Python:从长格式熊猫数据帧创建嵌套列表

时间:2018-02-22 13:40:01

标签: python list pandas dataframe

我有一个数据帧,只有" peak_time"是一栏:

stimulus position peak_time 
1        1        1.0
         2        1.5
2        1        2.0
         2        2.0
3        1        2.5

现在我试图挤压位置列并获取列表,以便它看起来像这样:

stimulus peak_time  
1        [1.0, 1.5]
2        [2.0, 2.0]
3        [2.5]

这可能非常简单,但我无法使用goole找到任何解决方案。如果有人已经打开了这个主题,我也会欣赏相应的链接。谢谢你的帮助!

创建数据框的代码

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_   = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_

修改

由于我每90分钟只能发一个问题,所以我想在这篇帖子下面提出一个跟进问题。所以现在我有两个熊猫系列看起来像这样:

median_:
stimulus
1    [1.0, 1.5]
2    [2.0, 2.0]
3    [2.0]

quartile_:
stimulus
1    [[1.0, 70.0],  [1.0, 183.25]]
2    [[1.0, 65.75], [2.0, 98.75]]
3    [[1.0, 51.25]]

我想从median_中减去quartile_,以便我得到

distance_: 
stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]

有一种简单的方法吗? abs(median_ - quartile_)不起作用。

创建系列的代码:

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)

解决方案

稍后申请groupby(level=0).apply(list),所以

median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))

然后我可以轻松地减去它们

distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_

stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]

1 个答案:

答案 0 :(得分:3)

它是MultiIndex Series,因此Series.groupby list需要apply

#added column peak_time
median_   = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df        = median_.groupby(level=0).apply(list).reset_index()
print (df)
   stimulus   peak_time
0         1  [1.0, 1.5]
1         2  [2.0, 2.0]
2         3       [2.5]