我有一个整洁的数据框tidy
:
>>> import pandas as pd
import uuid
import random
tidy = pd.DataFrame(columns=['measure_type', 'sensor', 'value'])
for measurement, sensor in zip(5*['type_a'] + 5*['type_b'],
2*[uuid.uuid4() for _ in range(5)]
):
tidy = tidy.append(pd.Series({'measure_type':measurement,
'sensor':sensor,
'value':random.random()
}
), ignore_index=True)
>>> tidy
measure_type sensor value
0 type_a f9726059-1352-49fb-9cc7-cffdf84db325 0.323960
1 type_a f3724608-3c28-49c7-a237-09b02a75694b 0.727934
2 type_a d59d29ec-32cb-4853-b822-8ac9abec07b9 0.357074
3 type_a 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.312051
4 type_a e59f5497-eb25-4084-816a-297d67768891 0.750661
5 type_b f9726059-1352-49fb-9cc7-cffdf84db325 0.424161
6 type_b f3724608-3c28-49c7-a237-09b02a75694b 0.608558
7 type_b d59d29ec-32cb-4853-b822-8ac9abec07b9 0.759485
8 type_b 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.095980
9 type_b e59f5497-eb25-4084-816a-297d67768891 0.382245
我想在这个整洁的数据帧中追加5行,对于每个传感器,读取type_a和读取type_b之间的差异。这些新行的测量类型是type_c
我发现这样做的方式太长而且不对。这是有效的:
>>> df_a = tidy[tidy['measure_type']=='type_a'] # New df with readings for a only
>>> df_a.set_index(keys='sensor', inplace=True) # Make sensor ID the key
>>> df_a.drop('measure_type', axis=1, inplace=True) #Keep only the reading value
>>> df_a
sensor value
f9726059-1352-49fb-9cc7-cffdf84db325 0.323960
f3724608-3c28-49c7-a237-09b02a75694b 0.727934
d59d29ec-32cb-4853-b822-8ac9abec07b9 0.357074
4a384d86-6288-49f3-be5d-8d54a811b9bd 0.312051
e59f5497-eb25-4084-816a-297d67768891 0.750661
为type_b做同样的事情......
[...]
>>> df_b
sensor value
f9726059-1352-49fb-9cc7-cffdf84db325 0.424161
f3724608-3c28-49c7-a237-09b02a75694b 0.608558
d59d29ec-32cb-4853-b822-8ac9abec07b9 0.759485
4a384d86-6288-49f3-be5d-8d54a811b9bd 0.095980
e59f5497-eb25-4084-816a-297d67768891 0.382245
现在我可以减去这两个:
>>> df_c = df_a - df_b
我需要添加一个测量类型的列:
>>> df_c['measure_type'] = 'type_c'
只有这样才能追加,我必须重置索引:
>>> tidy = tidy.append(c)
>>> tidy.reset_index(drop=True)
>>> tidy
measure_type sensor value
0 type_a f9726059-1352-49fb-9cc7-cffdf84db325 0.323960
1 type_a f3724608-3c28-49c7-a237-09b02a75694b 0.727934
2 type_a d59d29ec-32cb-4853-b822-8ac9abec07b9 0.357074
3 type_a 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.312051
4 type_a e59f5497-eb25-4084-816a-297d67768891 0.750661
5 type_b f9726059-1352-49fb-9cc7-cffdf84db325 0.424161
6 type_b f3724608-3c28-49c7-a237-09b02a75694b 0.608558
7 type_b d59d29ec-32cb-4853-b822-8ac9abec07b9 0.759485
8 type_b 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.095980
9 type_b e59f5497-eb25-4084-816a-297d67768891 0.382245
10 type_c f9726059-1352-49fb-9cc7-cffdf84db325 -0.100200
11 type_c f3724608-3c28-49c7-a237-09b02a75694b 0.119377
12 type_c d59d29ec-32cb-4853-b822-8ac9abec07b9 -0.402411
13 type_c 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.216071
14 type_c e59f5497-eb25-4084-816a-297d67768891 0.368416
这简直不是最简单的方法。
修改
该方法还应与tidy2
和tidy3
一起使用,其中:
tidy2 = tidy.drop(1)
和
tidy3 = tidy
tidy3.loc[0, 'sensor']='some other uuid'
即。在每个传感器中的键或键数不相同的情况下。
答案 0 :(得分:3)
pd.concat([tidy,
tidy.groupby(['sensor'])
.apply(lambda x: x.loc[x.measure_type == 'type_a','value'].max()-x.loc[x.measure_type == 'type_b','value'].min())
.reset_index().assign(measure_type='type_c')
.rename(columns={0:'value'})]).replace(0,pd.np.nan)
让我们试试:
pd.concat([tidy,
tidy.groupby(['sensor'])
.apply(lambda x: x.iloc[0,2]-x.iloc[1,2])
.reset_index().assign(measure_type='type_c')
.rename(columns={0:'value'})])
输出:
measure_type sensor value
0 type_a 3bbbe393-74bc-4c77-b95c-fbaaac64ed3f 0.638573
1 type_a b9b72088-078a-4dd6-91b5-f9e6643a9d43 0.468320
2 type_a 4f90f177-0ed8-4ff5-b635-f317925aebcc 0.945822
3 type_a 307db09c-6b46-4518-b822-7771ab97fbbe 0.886271
4 type_a 061bf0f3-9870-4426-9327-a9e7d9208923 0.757897
5 type_b 3bbbe393-74bc-4c77-b95c-fbaaac64ed3f 0.922330
6 type_b b9b72088-078a-4dd6-91b5-f9e6643a9d43 0.711345
7 type_b 4f90f177-0ed8-4ff5-b635-f317925aebcc 0.501771
8 type_b 307db09c-6b46-4518-b822-7771ab97fbbe 0.381833
9 type_b 061bf0f3-9870-4426-9327-a9e7d9208923 0.399346
0 type_c 061bf0f3-9870-4426-9327-a9e7d9208923 0.358551
1 type_c 307db09c-6b46-4518-b822-7771ab97fbbe 0.504438
2 type_c 3bbbe393-74bc-4c77-b95c-fbaaac64ed3f -0.283757
3 type_c 4f90f177-0ed8-4ff5-b635-f317925aebcc 0.444052
4 type_c b9b72088-078a-4dd6-91b5-f9e6643a9d43 -0.243025
答案 1 :(得分:2)
tidy.set_index(['sensor', 'measure_type']).value.unstack() \
.eval('type_c = type_a - type_b', inplace=False).stack() \
.sort_index(level='measure_type').swaplevel(0, 1).reset_index(name='value')
measure_type sensor value
0 type_a 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.312051
1 type_a d59d29ec-32cb-4853-b822-8ac9abec07b9 0.357074
2 type_a e59f5497-eb25-4084-816a-297d67768891 0.750661
3 type_a f3724608-3c28-49c7-a237-09b02a75694b 0.727934
4 type_a f9726059-1352-49fb-9cc7-cffdf84db325 0.323960
5 type_b 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.095980
6 type_b d59d29ec-32cb-4853-b822-8ac9abec07b9 0.759485
7 type_b e59f5497-eb25-4084-816a-297d67768891 0.382245
8 type_b f3724608-3c28-49c7-a237-09b02a75694b 0.608558
9 type_b f9726059-1352-49fb-9cc7-cffdf84db325 0.424161
10 type_c 4a384d86-6288-49f3-be5d-8d54a811b9bd 0.216071
11 type_c d59d29ec-32cb-4853-b822-8ac9abec07b9 -0.402411
12 type_c e59f5497-eb25-4084-816a-297d67768891 0.368416
13 type_c f3724608-3c28-49c7-a237-09b02a75694b 0.119376
14 type_c f9726059-1352-49fb-9cc7-cffdf84db325 -0.100201