dataframe列在具有更改值的新数据框中返回

时间:2017-05-16 14:08:15

标签: python pandas dataframe

def get_list_of_university_towns():
   states = {'CA' : 'California', 'SC' : 'South Carolina'}
   df = pd.read_csv(filename) # filename.csv has many columns 'State' and 'RegionName' are within
   df_res = df[['State', 'RegionName']]
   return df_res

该函数返回我正在寻找的信息的一个很好的列表。我该如何退回“状态”状态?列但替换:

df_res.loc[:, 'State'].replace(states)

我曾尝试return [df_res.loc[:, 'State'].replace(states), df['RegionName']],但它返回2个数据帧 我知道替换可以用原来的 df 来完成,但是我可以保留df原样吗?

2 个答案:

答案 0 :(得分:1)

第一个解决方案replace列分开:

def get_list_of_university_towns():
   states = {'CA' : 'California', 'SC' : 'South Carolina'}
   df = pd.read_csv(filename)
   df_res = df[['State', 'RegionName']]
   df_res['State'] = df_res['State'].replace(states)
   return df_res

另一种解决方案是在replace中定义dict列:

def get_list_of_university_towns():
   states = {'CA' : 'California', 'SC' : 'South Carolina'}
   df = pd.read_csv(filename)
   df_res = df[['State', 'RegionName']].replace({'State':states})
   return df_res

样品:

df = pd.DataFrame({'State':['SC','CA'], 'RegionName':['CA','SC'], 'col':[5,8]})
states = {'CA' : 'California', 'SC' : 'South Carolina'}
df_res = df[['State', 'RegionName']].replace({'State':states})
print (df_res)
            State RegionName
0  South Carolina         CA
1      California         SC

print (df)
  RegionName State  col
0         CA    SC    5
1         SC    CA    8

答案 1 :(得分:1)

我认为这里的关键是复制原始df,然后使用重新分配或inplace参数修改列。下面是我用来测试我的例子的df定义。

import pandas as pd

df = pd.DataFrame({'State': ['CA', 'SC', 'CA', 'SC', 'CA', 'SC', 'CA', 'SC'],
                   'RegionName': ['SW', 'NE', 'SW', 'NE', 'SW', 'NE', 'SW', 'NE'],
                   'College': ['College1', 'College2', 'College1', 'College2', 'College1', 'College2', 'College1', 'College2']})

结果:

    College RegionName State
0  College1         SW    CA
1  College2         NE    SC
2  College1         SW    CA
3  College2         NE    SC
4  College1         SW    CA
5  College2         NE    SC
6  College1         SW    CA
7  College2         NE    SC

我从那里复制了df并使用你的字典states = {'CA': 'California', 'SC': 'South Carolina'}来替换新df中的列。

df_res = df.loc[:, ['State', 'RegionName']]
df_res.State.replace(states, inplace=True)

但看起来像:

df_res = df.loc[:, ['State', 'RegionName']]
df_res['State'] = df_res.State.replace(states)

导致:

DF =

    College RegionName State
0  College1         SW    CA
1  College2         NE    SC
2  College1         SW    CA
3  College2         NE    SC
4  College1         SW    CA
5  College2         NE    SC
6  College1         SW    CA
7  College2         NE    SC

df_res =

            State RegionName
0      California         SW
1  South Carolina         NE
2      California         SW
3  South Carolina         NE
4      California         SW
5  South Carolina         NE
6      California         SW
7  South Carolina         NE