如果我的问题之前得到了回答,或者回答很明显,请道歉。
让我们说在我的数据集中有两个任务,每个任务有20个不同的试验。现在我想只选择每个试验的最后6秒进行进一步分析。
数据集看起来像这样(+更多列)。该样本涵盖了一项任务的所有20个试验。索引值与完整数据集中的一样,时间以unix时间戳(ms)给出。
index time x y Trial_Id
13512 1519227368636.0000 1022.0000 602.0000 1
13513 1519227368683.0000 1019.0000 697.0000 1
13514 1519227368728.0000 966.0000 530.0000 1
13515 1519227368752.0000 961.0000 576.0000 1
13516 1519227368806.0000 1120.0000 631.0000 1
...
17076 1519227518503.0000 804.0000 694.0000 20
17077 1519227518549.0000 789.0000 738.0000 20
17078 1519227518596.0000 809.0000 747.0000 20
17079 1519227518678.0000 806.0000 735.0000 20
17080 1519227518713.0000 823.0000 605.0000 20
在单人审判的水平上,iloc完成了这项工作。但是,当我尝试对由trial_Id分组的数据应用iloc时,我收到错误:
TypeError:'DataFrame'对象是可变的,因此它们不可能 散列。
我使用的代码:
保留最后6秒的功能
def img_trial(数据,开始): data1 = data.iloc [start:-1,:] return data1
按试验分组的数据的功能应用程序
你能告诉我一些错误的提示吗?我是一个大熊猫新手。 对不起,如果我的问题不够明确(这是长期潜伏的第一篇文章)。data.groupby(['Trial_Nr'])。apply(img_trial(data,80))
致以最诚挚的问候,
纳特
答案 0 :(得分:0)
我认为每max
组的datetime
价值为print (df)
index time x y Trial_Id
8 13515 1519227361052.0000 961.0 576.0 1
7 13514 1519227362028.0000 966.0 530.0 1
5 13512 1519227363636.0000 1022.0 602.0 1
4 13516 1519227364806.0000 1120.0 631.0 1
3 13515 1519227365752.0000 961.0 576.0 1
2 13514 1519227366728.0000 966.0 530.0 1
1 13513 1519227367683.0000 1019.0 697.0 1
9 13516 1519227368906.0000 1120.0 631.0 1
6 13513 1519227369283.0000 1019.0 697.0 1
0 13512 1519227369836.0000 1022.0 602.0 1
10 17076 1519227518503.0000 804.0 694.0 20
11 17077 1519227518549.0000 789.0 738.0 20
12 17078 1519227518596.0000 809.0 747.0 20
13 17079 1519227518678.0000 806.0 735.0 20
14 17080 1519227518713.0000 823.0 605.0 20
,我需要boolean indexing
:
#convert column time to datetime
df['time'] = pd.to_datetime(df['time'].astype(float), unit='ms')
#get max date per group
max_per_group = df.groupby('Trial_Id')['time'].transform('max')
#subtract 6 seconds
diff_6_sec = max_per_group - pd.Timedelta(6, unit='s')
#filter
df = df[diff_6_sec < df['time']]
print (df)
index time x y Trial_Id
4 13516 2018-02-21 15:36:04.806 1120.0 631.0 1
3 13515 2018-02-21 15:36:05.752 961.0 576.0 1
2 13514 2018-02-21 15:36:06.728 966.0 530.0 1
1 13513 2018-02-21 15:36:07.683 1019.0 697.0 1
9 13516 2018-02-21 15:36:08.906 1120.0 631.0 1
6 13513 2018-02-21 15:36:09.283 1019.0 697.0 1
0 13512 2018-02-21 15:36:09.836 1022.0 602.0 1
10 17076 2018-02-21 15:38:38.503 804.0 694.0 20
11 17077 2018-02-21 15:38:38.549 789.0 738.0 20
12 17078 2018-02-21 15:38:38.596 809.0 747.0 20
13 17079 2018-02-21 15:38:38.678 806.0 735.0 20
14 17080 2018-02-21 15:38:38.713 823.0 605.0 20
print (pd.concat([df['time'], max_per_group, diff_6_sec],
axis=1,
keys=('orig', 'max', 'sub_6s')))
orig max sub_6s
8 2018-02-21 15:36:01.052 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
7 2018-02-21 15:36:02.028 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
5 2018-02-21 15:36:03.636 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
4 2018-02-21 15:36:04.806 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
3 2018-02-21 15:36:05.752 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
2 2018-02-21 15:36:06.728 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
1 2018-02-21 15:36:07.683 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
9 2018-02-21 15:36:08.906 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
6 2018-02-21 15:36:09.283 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
0 2018-02-21 15:36:09.836 2018-02-21 15:36:09.836 2018-02-21 15:36:03.836
10 2018-02-21 15:38:38.503 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
11 2018-02-21 15:38:38.549 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
12 2018-02-21 15:38:38.596 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
13 2018-02-21 15:38:38.678 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
14 2018-02-21 15:38:38.713 2018-02-21 15:38:38.713 2018-02-21 15:38:32.713
为了更好地理解,可以将所有联合起来检查:
List<PieChart.Data> dataArrayList = new LinkedList<Data>();
if (value1>0) {
Data data = new PieChart.Data("my label", value1);
dataArrayList.add(data);
}
...
ObservableList<PieChart.Data> pieChartData =
FXCollections.observableArrayList(dataArrayList);