使用条件替换列表中的字符串

时间:2019-07-23 12:07:35

标签: python dataframe

我有一个数据框,其中有一列称为regional_codes。现在,我需要在数据框中添加一个新列,在该列中,区域代码将替换为该区域的国家/地区列表。

例如如果regional_codes包含['asia'],那么我需要在新列中包含['china','japan','india','bangaldesh'...]之类的亚洲国家/地区

目前我要做的是为每个区域创建一个单独的列表,并使用类似以下代码的代码

asia_list= ['asia','china','japan','india'...]
output_list = []
output_list+= [asia_list for w in regional_codes if w in asia_list]
output_list+= [africa_list for w in regional_codes if w in africa_list]

以此类推,直到所有区域列表都用尽

使用上面提供的代码,我的结果正是我所需要的,并且在运行时间方面也很有效。但是,我觉得自己正在做很长的路要走。因此,我正在寻找可以帮助我缩短代码的建议。

1 个答案:

答案 0 :(得分:0)

我发现这样做的一种方法是使用regional_codesregional_lists的所有必需数据创建一个DataFrame

import pandas as pd
import itertools
import numpy as np
# DF is your dataframe
# df is the dataframe containing the association between the regional_code and regional lists 
df = pd.DataFrame({'regional_code': ['asia', 'africa', 'europe'], 'ragional_list': [['China', 'Japan'], ['Morocco', 'Nigeria', 'Ghana'], ['France', 'UK', 'Germany', 'Spain']]})
#   regional_code                 ragional_list
# 0          asia                [China, Japan]
# 1        africa     [Morocco, Nigeria, Ghana]
# 2        europe  [France, UK, Germany, Spain]


df2 = pd.DataFrame({'regional_code': [['asia', 'africa'],['africa', 'europe']], 'ragional_list': [1,2]})
#       regional_code  ragional_list
# 0    [asia, africa]              1
# 1  [africa, europe]              2

df2['list'] = df2.apply(lambda x: list(itertools.chain.from_iterable((df.loc[df['regional_code']==i, 'ragional_list'] for i in x.loc['regional_code']))), axis=1)
# In [95]: df2                                                                                                                                                                                                                                                                              
# Out[95]: 
#       regional_code  ragional_list                                               list
# 0    [asia, africa]              1        [[China, Japan], [Morocco, Nigeria, Ghana]]
# 1  [africa, europe]              2  [[Morocco, Nigeria, Ghana], [France, UK, Germa...

现在我们将df2['list']

展平
df2['list'] = df2['list'].apply(np.concatenate)  
#       regional_code  ragional_list                                               list
# 0    [asia, africa]              1            [China, Japan, Morocco, Nigeria, Ghana]
# 1  [africa, europe]              2  [Morocco, Nigeria, Ghana, France, UK, Germany,...

我想这回答了您的问题?