我有一个如下所示的数据框:
POSITION Code_Count
S1 {"[471E;1]"}
S2 {"[471E;1]"}
S3 {"[471E;1]"}
S4 {"[471E;1]"}
S5 {"[471E;1]"}
S6 {"[5812;1]"}
S7 {"[471E;1]"}
S8 {"[471E;1]"}
T1 {"[7A2A;1]"}
T2 {"[471E;1]"}
T3 {"[7C95;1]"}
T4 {"[471E;1]"}
T5 {"[471E;1]"}
T6 {"[471E;1]"}
T7 {"[471E;1]"}
T8 {"[471E;1]"}
在Code_Count列中,第一个字符串是代码,数字是计数。 此外,代码分为4类A至D.类别中所有代码的列表如下: 代码分为4类,例如A到D,如下:
A类包含以下代码:7749 7783 7784 7786 7A14 7AC5 7C88 7C92 7C93 7C95 C749 C783 C784 C786 CA14 CAC5 CC88 CC92 CC93 CC95 442A 49C2
B类有以下代码:1D 32 430B 4415 448E 4490 4492 457A 457B 496C 4970 778A 7A09 7A2A 7A2C 7C7C 7C80 C78A CA09 CA2A CA2C
C类包含以下代码:7A7F 7A80 7C7E CA7F CA80 CAC8 7AC8 C77E 445A 496E 471E 49CA
D类:7AF0 7AF1 7AF2 7AF3 CAF0 CAF1 CAF2 CAF3 4616 4617 4618 5812
我希望我的最终数据帧根据初始数据帧中存在的代码,根据它们所属的类别对它们进行排序,从而将代码计数包含到相应的位置。例如,根据上述数据帧的输出数据帧应为:
POSITION Category A Category B Category C Category D
S1 0 0 1 0
S2 0 0 1 0
S3 0 0 1 0
S4 0 0 1 0
S5 0 0 1 0
S6 0 0 0 1
S7 0 0 1 0
S8 0 0 1 0
T1 0 1 0 0
T2 0 0 1 0
T3 1 0 0 0
T4 0 0 1 0
T5 0 0 1 0
T6 0 0 1 0
T7 0 0 1 0
T8 0 0 1 0
我尝试过使用str.contains方法,但没有成功。任何帮助将非常感激。非常感谢提前!
答案 0 :(得分:1)
我认为您可以先按strip
和split
提取值,然后使用ix
创建的掩码Count
添加0
。最近isin
个不必要的列和drop
catA = ['7749','7783','7784','7786','7A14','7AC5','7C88','7C92','7C93','7C95','C749','C783','C784','C786','CA14','CAC5','CC88','CC92','CC93','CC95','442A','49C2']
catB = ['1D','32','430B','4415','448E','4490','4492','457A','457B','496C','4970','778A','7A09','7A2A','7A2C','7C7C','7C80','C78A','CA09','CA2A','CA2C']
catC = ['7A7F','7A80','7C7E','CA7F','CA80','CAC8 7AC8 C77E','445A','496E','471E','49CA']
catD = ['7AF0','7AF1','7AF2','7AF3','CAF0','CAF1','CAF2','CAF3','4616','4617','4618','5812']
:
df[['Code','Count']] = df.Code_Count.str.strip('{["]}').str.split(';', expand=True)
df['Category A'] = df.ix[df.Code.isin(catA), 'Count']
df['Category B'] = df.ix[df.Code.isin(catB), 'Count']
df['Category C'] = df.ix[df.Code.isin(catC), 'Count']
df['Category D'] = df.ix[df.Code.isin(catD), 'Count']
df.drop(['Code_Count', 'Code', 'Count'], axis=1, inplace=True)
df[['Category A','Category B','Category C','Category D']] =
df[['Category A','Category B','Category C','Category D']].fillna(0)
print (df)
POSITION Category A Category B Category C Category D
0 S1 0 0 1 0
1 S2 0 0 1 0
2 S3 0 0 1 0
3 S4 0 0 1 0
4 S5 0 0 1 0
5 S6 0 0 0 1
6 S7 0 0 1 0
7 S8 0 0 1 0
8 T1 0 1 0 0
9 T2 0 0 1 0
10 T3 1 0 0 0
11 T4 0 0 1 0
12 T5 0 0 1 0
13 T6 0 0 1 0
14 T7 0 0 1 0
15 T8 0 0 1 0
security.csp.enable