我有一个数据框,其中有一列称为regional_codes。现在,我需要在数据框中添加一个新列,在该列中,区域代码将替换为该区域的国家/地区列表。
例如如果regional_codes包含['asia']
,那么我需要在新列中包含['china','japan','india','bangaldesh'...]
之类的亚洲国家/地区
目前我要做的是为每个区域创建一个单独的列表,并使用类似以下代码的代码
asia_list= ['asia','china','japan','india'...]
output_list = []
output_list+= [asia_list for w in regional_codes if w in asia_list]
output_list+= [africa_list for w in regional_codes if w in africa_list]
以此类推,直到所有区域列表都用尽
使用上面提供的代码,我的结果正是我所需要的,并且在运行时间方面也很有效。但是,我觉得自己正在做很长的路要走。因此,我正在寻找可以帮助我缩短代码的建议。
答案 0 :(得分:0)
我发现这样做的一种方法是使用regional_codes
和regional_lists
的所有必需数据创建一个DataFrame
import pandas as pd
import itertools
import numpy as np
# DF is your dataframe
# df is the dataframe containing the association between the regional_code and regional lists
df = pd.DataFrame({'regional_code': ['asia', 'africa', 'europe'], 'ragional_list': [['China', 'Japan'], ['Morocco', 'Nigeria', 'Ghana'], ['France', 'UK', 'Germany', 'Spain']]})
# regional_code ragional_list
# 0 asia [China, Japan]
# 1 africa [Morocco, Nigeria, Ghana]
# 2 europe [France, UK, Germany, Spain]
df2 = pd.DataFrame({'regional_code': [['asia', 'africa'],['africa', 'europe']], 'ragional_list': [1,2]})
# regional_code ragional_list
# 0 [asia, africa] 1
# 1 [africa, europe] 2
df2['list'] = df2.apply(lambda x: list(itertools.chain.from_iterable((df.loc[df['regional_code']==i, 'ragional_list'] for i in x.loc['regional_code']))), axis=1)
# In [95]: df2
# Out[95]:
# regional_code ragional_list list
# 0 [asia, africa] 1 [[China, Japan], [Morocco, Nigeria, Ghana]]
# 1 [africa, europe] 2 [[Morocco, Nigeria, Ghana], [France, UK, Germa...
现在我们将df2['list']
df2['list'] = df2['list'].apply(np.concatenate)
# regional_code ragional_list list
# 0 [asia, africa] 1 [China, Japan, Morocco, Nigeria, Ghana]
# 1 [africa, europe] 2 [Morocco, Nigeria, Ghana, France, UK, Germany,...
我想这回答了您的问题?