我有一个数据帧,只有" peak_time"是一栏:
stimulus position peak_time
1 1 1.0
2 1.5
2 1 2.0
2 2.0
3 1 2.5
现在我试图挤压位置列并获取列表,以便它看起来像这样:
stimulus peak_time
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.5]
这可能非常简单,但我无法使用goole找到任何解决方案。如果有人已经打开了这个主题,我也会欣赏相应的链接。谢谢你的帮助!
创建数据框的代码:
import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_ = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_
修改
由于我每90分钟只能发一个问题,所以我想在这篇帖子下面提出一个跟进问题。所以现在我有两个熊猫系列看起来像这样:
median_:
stimulus
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.0]
quartile_:
stimulus
1 [[1.0, 70.0], [1.0, 183.25]]
2 [[1.0, 65.75], [2.0, 98.75]]
3 [[1.0, 51.25]]
我想从median_
中减去quartile_
,以便我得到
distance_:
stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]
有一种简单的方法吗? abs(median_ - quartile_)
不起作用。
创建系列的代码:
import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)
解决方案
稍后申请groupby(level=0).apply(list)
,所以
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))
然后我可以轻松地减去它们
distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_
stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]
答案 0 :(得分:3)
它是MultiIndex Series
,因此Series.groupby
list
需要apply
:
#added column peak_time
median_ = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df = median_.groupby(level=0).apply(list).reset_index()
print (df)
stimulus peak_time
0 1 [1.0, 1.5]
1 2 [2.0, 2.0]
2 3 [2.5]