如何更改分类列的值?

时间:2019-09-29 14:21:24

标签: python pandas data-science data-cleaning

在我的数据框中,我有“国家”列,我试图将该列值更改为“发达国家”和“发展中国家”。我的数据框如下:

   countries age gender
1  India     21  Male
2  China     22  Female
3  USA       23  Male
4  UK        25  Male

我有以下两个数组:

developed = ['USA','UK']
developing = ['India', 'China']

我想将数组转换为以下数据帧:

   countries    age gender
1  developing   21  Male
2  developing   22  Female
3  developed    23  Male
4  developed    25  Male

我尝试了以下代码,但出现“ SettingWithCopyWarning”错误:

df[df['countries'].isin(developed)]['countries'] = 'developed'

我尝试了以下代码,但出现“ SettingWithCopyWarning”错误,并且我的jupyter笔记本被挂起:

for i, x in enumerate(df['countries']):
    if x in developed:
        df['countries'][i] = 'developed'

是更改列类别的另一种方法吗?

2 个答案:

答案 0 :(得分:2)

使用np.where

#!/bin/sh

# dump only schema "tmp"
# force quoted identifiers
# use sed to strip them
# [youstillneedtoremove the "CReate SCHEMA $SCH_NAME-stuff

DB_NAME="postgres"

pg_dump -Upostgres -n tmp --schema-only --quote-all-identifiers $DB_NAME \
   | sed 's/"tmp"\.//g' > tmp_schema_stripped.sql

#EOF

您还可以使用DataFrame.loc

import numpy as np 
df['countries']=np.where(df['countries'].isin(developed),'developed','developing')
print(df)

    countries  age  gender
1  developing   21    Male
2  developing   22  Female
3   developed   23    Male
4   developed   25    Male

答案 1 :(得分:0)

您可以尝试实现替换功能,但不会出现错误。

Updated_DataSet1 = data_set.replace("India", "Developing")
Updated_DataSet2 = Updated_DataSet1.replace("China","Developing")