Question

在下面您看到我有一个名为westCountries的对象，在下面您将看到我有一个名为countryDf的数据框。

westCountries = {'West': ['US', 'CA', 'PR']}

# countryDF

      Country 
0        [US]
1        [PR]
2        [CA]
3        [HK]

我想知道如何将westCountries obj包含在名为Location的新列中的数据框中？我已经尝试过合并，但实际上并没有做任何事情，因为奇怪的是，我需要此列中的值作为对象中键的名称，如下所示。注意：此输出仅是示例，我了解那里与我提供的数据和所需的输出之间缺少相关性。

  Country Location
0      US     West
1      CA     West

我正在考虑做一些事情，例如：

使用.isin（），然后对该数据框进行更多的转换/计算，以填充我的数据框，但是这种方法对我来说似乎有点模糊。
使用df.loc [...]将数据框与该列表中的值进行比较，然后我可以使用自己选择的值创建自己的列。
将对象转换为数据框，然后在此临时数据框中创建一个新列，然后按国家/地区合并，这样我就可以将locations列包含到我的countryDF数据框中。

但是，我觉得可能有比我上面列出的所有这些方法更为完善的解决方案。这就是为什么我要寻求帮助。

Answer 1

使用pandas.DataFrame.explode从列表中删除值
使用list comprehension将值与westCountries值列表匹配并返回key
在示例中，示例数据框列值创建为字符串，并且需要使用ast.literal_eval转换为dict类型

import pandas as pd
from ast import literal_eval  # only for setting up the test dataframe

# setup the test dataframe
data = {'Country': ["['US']", "['PR']", "['CA']", "['HK']"]}
df = pd.DataFrame(data)
df.Country = df.Country.apply(literal_eval)  # only for the test data

westCountries = {'West': ['US', 'CA', 'PR']}

# remove the values from lists, with explode
df = df.explode('Country')

# create the Loc column using apply
df['Loc'] = df.Country.apply(lambda x: [k if x in v else None for k, v in westCountries.items()][0])

# drop rows with None
df = df.dropna()

# display(df)
  Country   Loc
0      US  West
1      PR  West
2      CA  West

选项2（更好）：

在第一个选项中，对于每一行，.apply必须使用key-value遍历westCountries中的每一对[k if x in v else None for k, v in westCountries.items()]，这很慢。
最好使用westCountries将dict重塑为平坦的value，并以state和dict comprehension为键的区域。
使用pandas.Series.map将dict值映射到新列中

import pandas as pd
from ast import literal_eval  # only for setting up the test dataframe

# setup the test dataframe
data = {'Country': ["['US']", "['PR']", "['CA']", "['HK']"]}
df = pd.DataFrame(data)
df.Country = df.Country.apply(literal_eval)  # only for the test data

# remove the values from lists, with explode
df = df.explode('Country')

# given
westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}

# unpack westCountries where all values are keys and key are values
mapped = {x: k for k, v in westCountries.items() for x in v}

# print(mapped)
{'US': 'West', 'CA': 'West', 'PR': 'West', 'NY': 'East', 'NC': 'East'}

# map the dict to the column
df['Loc'] = df.Country.map(mapped)

# dropna
df = df.dropna()

Answer 2

您可以使用pd.melt，然后使用df.explode和df.merge炸开df

westCountries = {'West': ['US', 'CA', 'PR']}
west = pd.melt(pd.DataFrame(westCountries), var_name='Loc', value_name='Country')

df.explode('Country').merge(west, on='Country')
  Country   Loc
0      US  West
1      PR  West
2      CA  West

详细信息

pd.DataFrame(westCountries)

#  West
#0   US
#1   CA
#2   PR

# Now melt the above dataframe
pd.melt(pd.DataFrame(westCountries), var_name='Loc', value_name='Country')

#    Loc Country
#0  West      US
#1  West      CA
#2  West      PR

# Now, merge `df` after exploding with `west` on `Country`
df.explode('Country').merge(west, on='Country') # how = 'left' by default in merge

#  Country   Loc
#0      US  West
#1      PR  West
#2      CA  West

编辑：

如果您的westCountries字典大小不相等，请尝试

from itertools import zip_longest

westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}

west = pd.DataFrame(zip_longest(*westCountries.values(),fillvalue = np.nan),
                    columns= westCountries.keys())
west = west.melt(var_name='Loc', value_name='Country').dropna()

df.explode('Country').merge(west, on='Country')

上述示例：

df
  Country
0    [US]
1    [PR]
2    [CA]
3    [HK]
4    [NY] #--> added `NY` from `East`.

westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}

west = pd.DataFrame(zip_longest(*westCountries.values(),fillvalue = np.nan),
                    columns= westCountries.keys())
west = west.melt(var_name='Loc', value_name='Country').dropna()
df.explode('Country').merge(west, on='Country')

#  Country   Loc
#0      US  West
#1      PR  West
#2      CA  West
#3      NY  East

Answer 3

就运行时间而言，这可能不是最快的方法，但它可行

import pandas as pd

westCountries = {'West': ['US', 'CA', 'PR']}
df = pd.DataFrame(["[US]","[PR]", "[CA]", "[HK]"], columns=["Country"])

df = df.assign(Location="")
for index, row in df.iterrows():
    if any([True for country in westCountries.get('West') if country in row['Country']]):
    row.Location='West'

west_df = df[df['Location'] != ""]

合并对象与熊猫数据框

3 个答案:

选项2（更好）：

详细信息

编辑：