pandas groupby,不能将iloc应用于分组对象

时间:2018-02-26 11:07:55

标签: pandas pandas-groupby

如果我的问题之前得到了回答,或者回答很明显,请道歉。

让我们说在我的数据集中有两个任务,每个任务有20个不同的试验。现在我想只选择每个试验的最后6秒进行进一步分析。

数据集看起来像这样(+更多列)。该样本涵盖了一项任务的所有20个试验。索引值与完整数据集中的一样,时间以unix时间戳(ms)给出。

index   time                x           y          Trial_Id
13512   1519227368636.0000  1022.0000   602.0000    1
13513   1519227368683.0000  1019.0000   697.0000    1
13514   1519227368728.0000  966.0000    530.0000    1
13515   1519227368752.0000  961.0000    576.0000    1
13516   1519227368806.0000  1120.0000   631.0000    1
...
17076   1519227518503.0000  804.0000    694.0000    20
17077   1519227518549.0000  789.0000    738.0000    20
17078   1519227518596.0000  809.0000    747.0000    20
17079   1519227518678.0000  806.0000    735.0000    20
17080   1519227518713.0000  823.0000    605.0000    20

在单人审判的水平上,iloc完成了这项工作。但是,当我尝试对由trial_Id分组的数据应用iloc时,我收到错误:

  

TypeError:'DataFrame'对象是可变的,因此它们不可能   散列。

我使用的代码:

保留最后6秒的功能

  

def img_trial(数据,开始):       data1 = data.iloc [start:-1,:]       return data1

按试验分组的数据的功能应用程序

  

data.groupby(['Trial_Nr'])。apply(img_trial(data,80))

你能告诉我一些错误的提示吗?我是一个大熊猫新手。 对不起,如果我的问题不够明确(这是长期潜伏的第一篇文章)。

致以最诚挚的问候,

纳特

1 个答案:

答案 0 :(得分:0)

我认为每max组的datetime价值为print (df) index time x y Trial_Id 8 13515 1519227361052.0000 961.0 576.0 1 7 13514 1519227362028.0000 966.0 530.0 1 5 13512 1519227363636.0000 1022.0 602.0 1 4 13516 1519227364806.0000 1120.0 631.0 1 3 13515 1519227365752.0000 961.0 576.0 1 2 13514 1519227366728.0000 966.0 530.0 1 1 13513 1519227367683.0000 1019.0 697.0 1 9 13516 1519227368906.0000 1120.0 631.0 1 6 13513 1519227369283.0000 1019.0 697.0 1 0 13512 1519227369836.0000 1022.0 602.0 1 10 17076 1519227518503.0000 804.0 694.0 20 11 17077 1519227518549.0000 789.0 738.0 20 12 17078 1519227518596.0000 809.0 747.0 20 13 17079 1519227518678.0000 806.0 735.0 20 14 17080 1519227518713.0000 823.0 605.0 20 ,我需要boolean indexing

#convert column time to datetime
df['time'] = pd.to_datetime(df['time'].astype(float), unit='ms')

#get max date per group
max_per_group = df.groupby('Trial_Id')['time'].transform('max') 
#subtract 6 seconds
diff_6_sec = max_per_group - pd.Timedelta(6, unit='s')
#filter
df = df[diff_6_sec < df['time']]
print (df)
    index                    time       x      y  Trial_Id
4   13516 2018-02-21 15:36:04.806  1120.0  631.0         1
3   13515 2018-02-21 15:36:05.752   961.0  576.0         1
2   13514 2018-02-21 15:36:06.728   966.0  530.0         1
1   13513 2018-02-21 15:36:07.683  1019.0  697.0         1
9   13516 2018-02-21 15:36:08.906  1120.0  631.0         1
6   13513 2018-02-21 15:36:09.283  1019.0  697.0         1
0   13512 2018-02-21 15:36:09.836  1022.0  602.0         1
10  17076 2018-02-21 15:38:38.503   804.0  694.0        20
11  17077 2018-02-21 15:38:38.549   789.0  738.0        20
12  17078 2018-02-21 15:38:38.596   809.0  747.0        20
13  17079 2018-02-21 15:38:38.678   806.0  735.0        20
14  17080 2018-02-21 15:38:38.713   823.0  605.0        20
print (pd.concat([df['time'], max_per_group, diff_6_sec], 
                 axis=1, 
                 keys=('orig', 'max', 'sub_6s')))

                      orig                     max                  sub_6s
8  2018-02-21 15:36:01.052 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
7  2018-02-21 15:36:02.028 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
5  2018-02-21 15:36:03.636 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
4  2018-02-21 15:36:04.806 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
3  2018-02-21 15:36:05.752 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
2  2018-02-21 15:36:06.728 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
1  2018-02-21 15:36:07.683 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
9  2018-02-21 15:36:08.906 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
6  2018-02-21 15:36:09.283 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
0  2018-02-21 15:36:09.836 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
10 2018-02-21 15:38:38.503 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
11 2018-02-21 15:38:38.549 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
12 2018-02-21 15:38:38.596 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
13 2018-02-21 15:38:38.678 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
14 2018-02-21 15:38:38.713 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713

为了更好地理解,可以将所有联合起来检查:

    List<PieChart.Data> dataArrayList = new LinkedList<Data>();
    if (value1>0) {
            Data data = new PieChart.Data("my label", value1);
            dataArrayList.add(data);
    }
     ...

    ObservableList<PieChart.Data> pieChartData = 
    FXCollections.observableArrayList(dataArrayList);