在下面您看到我有一个名为westCountries的对象,在下面您将看到我有一个名为countryDf的数据框。
westCountries = {'West': ['US', 'CA', 'PR']}
# countryDF
Country
0 [US]
1 [PR]
2 [CA]
3 [HK]
我想知道如何将westCountries obj包含在名为Location的新列中的数据框中?我已经尝试过合并,但实际上并没有做任何事情,因为奇怪的是,我需要此列中的值作为对象中键的名称,如下所示。注意:此输出仅是示例,我了解那里与我提供的数据和所需的输出之间缺少相关性。
Country Location
0 US West
1 CA West
我正在考虑做一些事情,例如:
countryDF
数据框中。但是,我觉得可能有比我上面列出的所有这些方法更为完善的解决方案。这就是为什么我要寻求帮助。
答案 0 :(得分:2)
pandas.DataFrame.explode
从列表中删除值list comprehension
将值与westCountries
值列表匹配并返回key
ast.literal_eval
转换为dict
类型import pandas as pd
from ast import literal_eval # only for setting up the test dataframe
# setup the test dataframe
data = {'Country': ["['US']", "['PR']", "['CA']", "['HK']"]}
df = pd.DataFrame(data)
df.Country = df.Country.apply(literal_eval) # only for the test data
westCountries = {'West': ['US', 'CA', 'PR']}
# remove the values from lists, with explode
df = df.explode('Country')
# create the Loc column using apply
df['Loc'] = df.Country.apply(lambda x: [k if x in v else None for k, v in westCountries.items()][0])
# drop rows with None
df = df.dropna()
# display(df)
Country Loc
0 US West
1 PR West
2 CA West
.apply
必须使用key-value
遍历westCountries
中的每一对[k if x in v else None for k, v in westCountries.items()]
,这很慢。westCountries
将dict
重塑为平坦的value
,并以state
和dict comprehension
为键的区域。pandas.Series.map
将dict
值映射到新列中import pandas as pd
from ast import literal_eval # only for setting up the test dataframe
# setup the test dataframe
data = {'Country': ["['US']", "['PR']", "['CA']", "['HK']"]}
df = pd.DataFrame(data)
df.Country = df.Country.apply(literal_eval) # only for the test data
# remove the values from lists, with explode
df = df.explode('Country')
# given
westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}
# unpack westCountries where all values are keys and key are values
mapped = {x: k for k, v in westCountries.items() for x in v}
# print(mapped)
{'US': 'West', 'CA': 'West', 'PR': 'West', 'NY': 'East', 'NC': 'East'}
# map the dict to the column
df['Loc'] = df.Country.map(mapped)
# dropna
df = df.dropna()
答案 1 :(得分:1)
您可以使用pd.melt
,然后使用df.explode
和df.merge
炸开df
westCountries = {'West': ['US', 'CA', 'PR']}
west = pd.melt(pd.DataFrame(westCountries), var_name='Loc', value_name='Country')
df.explode('Country').merge(west, on='Country')
Country Loc
0 US West
1 PR West
2 CA West
pd.DataFrame(westCountries)
# West
#0 US
#1 CA
#2 PR
# Now melt the above dataframe
pd.melt(pd.DataFrame(westCountries), var_name='Loc', value_name='Country')
# Loc Country
#0 West US
#1 West CA
#2 West PR
# Now, merge `df` after exploding with `west` on `Country`
df.explode('Country').merge(west, on='Country') # how = 'left' by default in merge
# Country Loc
#0 US West
#1 PR West
#2 CA West
如果您的westCountries
字典大小不相等,请尝试
from itertools import zip_longest
westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}
west = pd.DataFrame(zip_longest(*westCountries.values(),fillvalue = np.nan),
columns= westCountries.keys())
west = west.melt(var_name='Loc', value_name='Country').dropna()
df.explode('Country').merge(west, on='Country')
上述示例:
df
Country
0 [US]
1 [PR]
2 [CA]
3 [HK]
4 [NY] #--> added `NY` from `East`.
westCountries = {'West': ['US', 'CA', 'PR'], 'East': ['NY', 'NC']}
west = pd.DataFrame(zip_longest(*westCountries.values(),fillvalue = np.nan),
columns= westCountries.keys())
west = west.melt(var_name='Loc', value_name='Country').dropna()
df.explode('Country').merge(west, on='Country')
# Country Loc
#0 US West
#1 PR West
#2 CA West
#3 NY East
答案 2 :(得分:0)
就运行时间而言,这可能不是最快的方法,但它可行
import pandas as pd
westCountries = {'West': ['US', 'CA', 'PR']}
df = pd.DataFrame(["[US]","[PR]", "[CA]", "[HK]"], columns=["Country"])
df = df.assign(Location="")
for index, row in df.iterrows():
if any([True for country in westCountries.get('West') if country in row['Country']]):
row.Location='West'
west_df = df[df['Location'] != ""]