如何创建列以填充从较小数据框映射的ID?

时间:2018-10-06 05:27:35

标签: python pandas dataframe data-science

我有两个数据框,一个包含很多行,其中包含重复的CategoryId属性,另一个数据框只有两列:CategoryIdCategory

print(map)
   CategoryId  Category
1  n013523     Snake
2  n012837     Iguana
3  n092735     Dragon

map.shape
(3, 2)


print(data)
   CategoryId  Size
1  n013523     0.4
2  n013523     0.8
3  n013523     0.15
4  n012837     0.16
5  n012837     0.23
6  n012837     0.42
...

data.shape
(500000, 2)

我想做的是在数据上创建一列,该列的值将在map['Category']中,其中map['CategoryId'] == data['CategoryId'],这样输出为:

print(data)
   CategoryId  Size  Category
1  n013523     0.4   Snake
2  n013523     0.8   Snake
3  n013523     0.15  Snake
4  n012837     0.16  Iguana
5  n012837     0.23  Iguana
6  n012837     0.42  Iguana
...

1 个答案:

答案 0 :(得分:1)

map函数用作:

cv2.error: OpenCV(3.4.3) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/color.cpp:181: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

或将merge用作:

map.set_index('CategoryId',inplace=True)
data['Category'] = data['CategoryId'].map(map['Category'],na_action=np.nan)

或使用data = data.merge(map,how='left',on='CategoryId') 并映射:

dict

或者如果没有字典data['Category'] = data.CategoryId.map(dict(map.values),na_action=np.nan) ,则使用dict并替换它可能会导致错误。

key

data['Category'] = data.CategoryId.replace(dict(map.values))