根据可用月份和标识符对可用性进行分类 - Python

时间:2016-12-23 17:22:14

标签: python regex pandas

我有一个带有标识符的数据集(3位数区代码_2位数国家代码_10位数代码)和月数(1到12),数据可用。标识符将在不同行上的每个月重复执行。例如,如果此标识符“LAX_CN_0000000000”具有3个月的数据,则它将使用相应的可用月份列出标识符3次。我想根据可用的月份对这些标识符进行分类。例如,我有:

Identifier_Column             Month_Column
LAX_CN_0000000000             1
IAH_MY_1111111111             10 
LAX_CN_0000000000             2
LAX_CN_0000000000             3
IAH_MY_1111111111             8 

我想看看:

    Identifier_Column      Month_Column    Classification
    LAX_CN_0000000000      1               In sequence but not all 12 months
    IAH_MY_1111111111      10              > 2 month but not in order
    LAX_CN_0000000000      2               In sequence but not all 12 months
    LAX_CN_0000000000      3               In sequence but not all 12 months
    IAH_MY_1111111111      8               > 2 month but not in order

因此会有4种不同类型的分类:

1. All 12 months available
2. Only 1 month available
3. In sequence but not all 12 months
4. > 2 month but not in order

1 个答案:

答案 0 :(得分:1)

<强> 设置
包括一些其他案例

df = pd.DataFrame(
    dict(
        Idetifier_Column=[id1] + [id2] + [id1] * 2 + [id3] * 12 + [id4] * 12,
        Month_Column=[1, 10, 2, 3] + list(range(1, 13)) + list(range(12, 0, -1))
    )
)

辅助功能
可能有更好的方法来检查是否排序

def is_sorted(x):
    return (np.arange(len(x)) == np.argsort(x)).all() * 1

def how_many(x):
    n = len(np.unique(x))
    return 1 if n == 1 else 2 if n < 12 else 3

将我创建的元组映射到描述性字符串

class_map = {
    (1, 1): "Only 1 month available",
    (2, 1): "In sequence but not all 12 months",
    (2, 0): "> 2 month but not in order",
    (3, 1): "All 12 months available",
    (3, 0): "All 12 months available out of order",
}

魔法

grpby = df.groupby('Idetifier_Column').Month_Column
df['Classification'] = \
    df.Idetifier_Column.map(
      # |<------------- creating tuples -------------->|
        grpby.agg([how_many, is_sorted]).apply(tuple, 1).map(class_map))
print(df)

     Idetifier_Column  Month_Column                        Classification
0   LAX_CN_0000000000             1     In sequence but not all 12 months
1   IAH_MY_1111111111            10                Only 1 month available
2   LAX_CN_0000000000             2     In sequence but not all 12 months
3   LAX_CN_0000000000             3     In sequence but not all 12 months
4   SFO_MY_2222222222             1               All 12 months available
5   SFO_MY_2222222222             2               All 12 months available
6   SFO_MY_2222222222             3               All 12 months available
7   SFO_MY_2222222222             4               All 12 months available
8   SFO_MY_2222222222             5               All 12 months available
9   SFO_MY_2222222222             6               All 12 months available
10  SFO_MY_2222222222             7               All 12 months available
11  SFO_MY_2222222222             8               All 12 months available
12  SFO_MY_2222222222             9               All 12 months available
13  SFO_MY_2222222222            10               All 12 months available
14  SFO_MY_2222222222            11               All 12 months available
15  SFO_MY_2222222222            12               All 12 months available
16  SEA_CN_3333333333            12  All 12 months available out of order
17  SEA_CN_3333333333            11  All 12 months available out of order
18  SEA_CN_3333333333            10  All 12 months available out of order
19  SEA_CN_3333333333             9  All 12 months available out of order
20  SEA_CN_3333333333             8  All 12 months available out of order
21  SEA_CN_3333333333             7  All 12 months available out of order
22  SEA_CN_3333333333             6  All 12 months available out of order
23  SEA_CN_3333333333             5  All 12 months available out of order
24  SEA_CN_3333333333             4  All 12 months available out of order
25  SEA_CN_3333333333             3  All 12 months available out of order
26  SEA_CN_3333333333             2  All 12 months available out of order
27  SEA_CN_3333333333             1  All 12 months available out of order