我有一个带有标识符的数据集(3位数区代码_2位数国家代码_10位数代码)和月数(1到12),数据可用。标识符将在不同行上的每个月重复执行。例如,如果此标识符“LAX_CN_0000000000”具有3个月的数据,则它将使用相应的可用月份列出标识符3次。我想根据可用的月份对这些标识符进行分类。例如,我有:
Identifier_Column Month_Column
LAX_CN_0000000000 1
IAH_MY_1111111111 10
LAX_CN_0000000000 2
LAX_CN_0000000000 3
IAH_MY_1111111111 8
我想看看:
Identifier_Column Month_Column Classification
LAX_CN_0000000000 1 In sequence but not all 12 months
IAH_MY_1111111111 10 > 2 month but not in order
LAX_CN_0000000000 2 In sequence but not all 12 months
LAX_CN_0000000000 3 In sequence but not all 12 months
IAH_MY_1111111111 8 > 2 month but not in order
因此会有4种不同类型的分类:
1. All 12 months available
2. Only 1 month available
3. In sequence but not all 12 months
4. > 2 month but not in order
答案 0 :(得分:1)
<强> 设置 强>
包括一些其他案例
df = pd.DataFrame(
dict(
Idetifier_Column=[id1] + [id2] + [id1] * 2 + [id3] * 12 + [id4] * 12,
Month_Column=[1, 10, 2, 3] + list(range(1, 13)) + list(range(12, 0, -1))
)
)
辅助功能
可能有更好的方法来检查是否排序
def is_sorted(x):
return (np.arange(len(x)) == np.argsort(x)).all() * 1
def how_many(x):
n = len(np.unique(x))
return 1 if n == 1 else 2 if n < 12 else 3
将我创建的元组映射到描述性字符串
class_map = {
(1, 1): "Only 1 month available",
(2, 1): "In sequence but not all 12 months",
(2, 0): "> 2 month but not in order",
(3, 1): "All 12 months available",
(3, 0): "All 12 months available out of order",
}
魔法
grpby = df.groupby('Idetifier_Column').Month_Column
df['Classification'] = \
df.Idetifier_Column.map(
# |<------------- creating tuples -------------->|
grpby.agg([how_many, is_sorted]).apply(tuple, 1).map(class_map))
print(df)
Idetifier_Column Month_Column Classification
0 LAX_CN_0000000000 1 In sequence but not all 12 months
1 IAH_MY_1111111111 10 Only 1 month available
2 LAX_CN_0000000000 2 In sequence but not all 12 months
3 LAX_CN_0000000000 3 In sequence but not all 12 months
4 SFO_MY_2222222222 1 All 12 months available
5 SFO_MY_2222222222 2 All 12 months available
6 SFO_MY_2222222222 3 All 12 months available
7 SFO_MY_2222222222 4 All 12 months available
8 SFO_MY_2222222222 5 All 12 months available
9 SFO_MY_2222222222 6 All 12 months available
10 SFO_MY_2222222222 7 All 12 months available
11 SFO_MY_2222222222 8 All 12 months available
12 SFO_MY_2222222222 9 All 12 months available
13 SFO_MY_2222222222 10 All 12 months available
14 SFO_MY_2222222222 11 All 12 months available
15 SFO_MY_2222222222 12 All 12 months available
16 SEA_CN_3333333333 12 All 12 months available out of order
17 SEA_CN_3333333333 11 All 12 months available out of order
18 SEA_CN_3333333333 10 All 12 months available out of order
19 SEA_CN_3333333333 9 All 12 months available out of order
20 SEA_CN_3333333333 8 All 12 months available out of order
21 SEA_CN_3333333333 7 All 12 months available out of order
22 SEA_CN_3333333333 6 All 12 months available out of order
23 SEA_CN_3333333333 5 All 12 months available out of order
24 SEA_CN_3333333333 4 All 12 months available out of order
25 SEA_CN_3333333333 3 All 12 months available out of order
26 SEA_CN_3333333333 2 All 12 months available out of order
27 SEA_CN_3333333333 1 All 12 months available out of order