我正在尝试编写一个程序,该程序根据以下规则为每个GROUP选择一个Customer_Address_ID。
如果Group_ID中的地址代码计数> = 2并且Group_ID中的CUST#计数= 1然后选择带有CUST#的地址代码
如果Group_ID中的地址代码计数> = 2并且Group_ID中的CUST#计数> = 2然后考虑该组中的Cust#
如果只有1个CUST#以G /#开头,则选择与该CUST#关联的Customer_Address_ID
如果有2个以上以C /#开头的Cust#,请选择Customer_Address_ID
如果没有以G /#开头的CUST#,请检查是否有以字母/字母开头的。如果只有一个以字母/字母开头,请选择与该CUST#相关联的Customer_Address_ID
如果2个以上字母/字母开头,请选择与任何一个相关联的Customer_Address_ID
如果没有以G /#或字母开头的CUST#,请检查是否有以A /#开头的
如果2+以A /#开头,请选择与较低CUST#相关联的Customer_Address_ID。
到目前为止,我具有每个组ID的Address_Code和Group_ID计数。...非常感谢您的帮助。
将熊猫作为pd导入 导入xlrd 将numpy导入为np
将Excel读入数据框
df = pd.read_excel(“ C:...........主要地址ID分配示例。xlsx”)
将特定列中的Null替换为任意值,例如0
df ['CUST#'] = df ['CUST#']。fillna(0)
查找每个唯一组ID的所有列的非零和
df2 = df [['GROUP_ID','CUSTOMER_ADDRESS_ID','CUST#']]。groupby('GROUP_ID')。agg(np.count_nonzero)
重命名列
df2.columns = ['CUSTOMER_ADDRESS_ID_COUNT','CUST#_COUNT']
联接两个数据框的列-使用df3的索引联接
df3 = df.join(df2,on = ['GROUP_ID'],how ='left')
+====================+==========+=====================+===========+
| PRIMARY_ADDRESS_ID | GROUP_ID | CUSTOMER_ADDRESS_ID | CUST# |
+====================+==========+=====================+===========+
| | 15 | 672390 | |
+--------------------+----------+---------------------+-----------+
| | 15 | 12491 | BRSSSS |
+--------------------+----------+---------------------+-----------+
| | 8 | 712154 | G99999 |
+--------------------+----------+---------------------+-----------+
| | 8 | 273672 | G2KKKK |
+--------------------+----------+---------------------+-----------+
| | 7 | 649202 | GDLLL |
+--------------------+----------+---------------------+-----------+
| | 7 | 714617 | ERHHHH |
+--------------------+----------+---------------------+-----------+
| | 7 | 398899 | |
+--------------------+----------+---------------------+-----------+
| | 9 | 672390 | A67860 |
+--------------------+----------+---------------------+-----------+
| | 9 | 12491 | A67861 |
+--------------------+----------+---------------------+-----------+
| EXPECTED OUTPUT | | | |
+--------------------+----------+---------------------+-----------+
| PRIMARY_ADDRESS_ID | GROUP_ID | CUSTOMER_ADDRESS_ID | CUSTOMER# |
+--------------------+----------+---------------------+-----------+
| 12491 | 15 | 672390 | |
+--------------------+----------+---------------------+-----------+
| 12491 | 15 | 12491 | BRSSSS |
+--------------------+----------+---------------------+-----------+
| 712154 | 8 | 712154 | G99999 |
+--------------------+----------+---------------------+-----------+
| 712154 | 8 | 273672 | G2KKKK |
+--------------------+----------+---------------------+-----------+
| 649202 | 7 | 649202 | G2DLLL |
+--------------------+----------+---------------------+-----------+
| 649202 | 7 | 714617 | ERHHHH |
+--------------------+----------+---------------------+-----------+
| 649202 | 7 | 398899 | |
+--------------------+----------+---------------------+-----------+
| 672390 | 9 | 672390 | A67860 |
+--------------------+----------+---------------------+-----------+
| 672390 | 9 | 12491 | A67861 |
+--------------------+----------+---------------------+-----------+