Question

我想添加一列“地区”，其中邮政编码的前两位可以归因于某个地区：

邮政编码：11 =地区：A
邮政编码：22 =地区：B
邮政编码：44 =地区：C

主表

Zip_code      product_id
110034        55454
114242        45445
113564        46454
223434        53533
224535        56455
223435        63535
444345        62435
443535        24353

输出表

Zip_code      product_id   Region
110034        55454        A
114242        45445        A 
113564        46454        A
223434        53533        B
224535        56455        B
223435        63535        B
444345        62435        C
443535        24353        C

Answer 1

您可以将Zip_codes切成薄片并用字典进行映射：

df['Region'] = df.Zip_code.astype(str).str[:2].map({'11':'A', '22':'B', '44':'C'})

print(df)
   Zip_code  product_id Region
0    110034       55454      A
1    114242       45445      A
2    113564       46454      A
3    223434       53533      B
4    224535       56455      B
5    223435       63535      B
6    444345       62435      C
7    443535       24353      C

Answer 2

您可以执行以下操作：

import pandas as pd


#map between first-digits in Zip-Code and Region
regions_map = {11: "A", 22:"B", 44:"C"}

df["Region"] = df["Zip_code"].apply(lambda x: regions_map[int(str(x)[:2])])

print(df)
#   Zip_code  product_id Region
#0    110034       55454      A
#1    114242       45445      A
#2    113564       46454      A
#3    223434       53533      B
#4    224535       56455      B
#5    223435       63535      B
#6    444345       62435      C
#7    443535       24353      C

映射列值的子字符串

2 个答案: