如何在数据表中对重复的测量值进行排序?

时间:2018-09-22 04:26:52

标签: python pandas

例如,我在一条生产线中有一个测量值:

import pandas as pd
import random
random.seed(2)
df = pd.DataFrame()
df['col1_date'] = pd.date_range('2018-09-01', periods=40, freq='D')
df['col2_id'] = pd.DataFrame(list(['A', 'B', 'C', 'D'] *10))
df['MeasuredValues'] = np.random.choice(10, 40)
df

基本上,例如['A','B','C','D']之类的项目会随着行中的进展而进行某些参数的测量。我正在尝试在表格中添加一列,以便为col2_id 中的每个项目提供一系列的测量值,例如:第一次测量时,我得到1st measurement,然后对于第二次测量,我得到2nd measurement,依此类推。我可以通过对col2_id进行排序,然后对col1_date进行排序,在excel中手动创建100个小表。显然,对于成千上万的行而言,这没有多大意义。然后,我可以比较col2_id中所有项目的不同度量。我不太了解如何在Pandas或Python中进行操作。

有人可以给我一些建议吗?

2 个答案:

答案 0 :(得分:3)

如果对日期进行了排序,则可以使用cumcount

df.join(
    df.item.str.findall('(a|c)')
           .str.join('|')
           .str.get_dummies().add_prefix('contains_')
)

   id   item  contains_a  contains_c
0   1  a,b,c           1           1
1   2  c,d,e           0           1
2   3    a,b           1           0

或者:

df['measurement_number'] = 'measurement '+(df.groupby('col2_id').cumcount()+1).astype(str)

两个都给你

df['measurement_number'] = (df.groupby('col2_id').cumcount()+1).map(lambda x: f'measurement {x}')

如果您的日期未排序,请先对其进行排序。进行与上述相同的操作,但使用:

>>> df
    col1_date col2_id  MeasuredValues measurement_number
0  2018-09-01       A               8      measurement 1
1  2018-09-02       B               8      measurement 1
2  2018-09-03       C               6      measurement 1
3  2018-09-04       D               2      measurement 1
4  2018-09-05       A               8      measurement 2
5  2018-09-06       B               7      measurement 2
6  2018-09-07       C               2      measurement 2
7  2018-09-08       D               1      measurement 2
8  2018-09-09       A               5      measurement 3
9  2018-09-10       B               4      measurement 3
10 2018-09-11       C               4      measurement 3
11 2018-09-12       D               5      measurement 3
12 2018-09-13       A               7      measurement 4
13 2018-09-14       B               3      measurement 4
14 2018-09-15       C               6      measurement 4
15 2018-09-16       D               4      measurement 4
16 2018-09-17       A               3      measurement 5
17 2018-09-18       B               7      measurement 5
18 2018-09-19       C               6      measurement 5
19 2018-09-20       D               1      measurement 5
20 2018-09-21       A               3      measurement 6
21 2018-09-22       B               5      measurement 6
22 2018-09-23       C               8      measurement 6
23 2018-09-24       D               4      measurement 6
24 2018-09-25       A               6      measurement 7
25 2018-09-26       B               3      measurement 7
26 2018-09-27       C               9      measurement 7
27 2018-09-28       D               2      measurement 7
28 2018-09-29       A               0      measurement 8
29 2018-09-30       B               4      measurement 8
30 2018-10-01       C               2      measurement 8
31 2018-10-02       D               4      measurement 8
32 2018-10-03       A               1      measurement 9
33 2018-10-04       B               7      measurement 9
34 2018-10-05       C               8      measurement 9
35 2018-10-06       D               2      measurement 9
36 2018-10-07       A               9     measurement 10
37 2018-10-08       B               8     measurement 10
38 2018-10-09       C               7     measurement 10
39 2018-10-10       D               1     measurement 10

第一。

答案 1 :(得分:2)

groupbyrank

df.assign(
    meas_num=
    df.groupby('col2_id').col1_date.rank().apply('{:.0f}'.format).radd('meas ')
)

    col1_date col2_id  MeasuredValues meas_num
0  2018-09-01       A               1   meas 1
1  2018-09-02       B               3   meas 1
2  2018-09-03       C               9   meas 1
3  2018-09-04       D               7   meas 1
4  2018-09-05       A               3   meas 2
5  2018-09-06       B               5   meas 2
6  2018-09-07       C               5   meas 2
7  2018-09-08       D               2   meas 2
8  2018-09-09       A               4   meas 3
9  2018-09-10       B               0   meas 3
10 2018-09-11       C               2   meas 3
11 2018-09-12       D               0   meas 3
12 2018-09-13       A               8   meas 4
13 2018-09-14       B               5   meas 4
14 2018-09-15       C               2   meas 4
15 2018-09-16       D               0   meas 4
16 2018-09-17       A               7   meas 5
17 2018-09-18       B               1   meas 5
18 2018-09-19       C               3   meas 5
19 2018-09-20       D               7   meas 5
20 2018-09-21       A               9   meas 6
21 2018-09-22       B               5   meas 6
22 2018-09-23       C               7   meas 6
23 2018-09-24       D               7   meas 6
24 2018-09-25       A               9   meas 7
25 2018-09-26       B               3   meas 7
26 2018-09-27       C               2   meas 7
27 2018-09-28       D               9   meas 7
28 2018-09-29       A               1   meas 8
29 2018-09-30       B               8   meas 8
30 2018-10-01       C               3   meas 8
31 2018-10-02       D               8   meas 8
32 2018-10-03       A               6   meas 9
33 2018-10-04       B               4   meas 9
34 2018-10-05       C               1   meas 9
35 2018-10-06       D               2   meas 9
36 2018-10-07       A               5  meas 10
37 2018-10-08       B               9  meas 10
38 2018-10-09       C               9  meas 10
39 2018-10-10       D               1  meas 10