我有一个字典,其值在熊猫系列中。我想创建一个新系列,它将在一个系列中查找一个值并返回一个带有相关键的新系列。例如:
import pandas as pd
df = pd.DataFrame({'season' : ['Nor 2014', 'Nor 2013', 'Nor 2013', 'Norv 2013',
'Swe 2014', 'Swe 2014', 'Swe 2013',
'Swe 2013', 'Sven 2013', 'Sven 2013', 'Norv 2014']})
nmdict = {'Norway' : [s for s in list(set(df.season)) if 'No' in s],
'Sweden' : [s for s in list(set(df.season)) if 'S' in s]}
df['country']
作为新列名称的所需结果:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
由于我的数据的性质,我必须手动制作nmdict
,如图所示。我已尝试this但无法撤消nmdict
,因为数组长度不同。
更重要的是,我认为我的做法可能是错误的。我来自Excel并且正在考虑使用vlookup解决方案,但根据this answer,我不应该以这种方式使用字典。
任何答案都赞赏。
答案 0 :(得分:1)
IIUC,我会做以下事情:
df['country'] = df['season'].apply(lambda x: 'Norway' if 'No' in x else 'Sweden' if 'S' in x else x)
答案 1 :(得分:1)
我已经以冗长的方式完成了这项工作,让您可以继续学习。
首先,让我们定义一个确定值'country'的函数
In [4]: def get_country(s):
...: if 'Nor' in s:
...: return 'Norway'
...: if 'S' in s:
...: return 'Sweden'
...: # return 'Default Country' # if you get unmatched values
In [5]: get_country('Sven')
Out[5]: 'Sweden'
In [6]: get_country('Norv')
Out[6]: 'Norway'
我们可以使用map
在每一行上运行get_country
。 Pandas DataFrames也有apply()
,其工作方式类似*。
In [7]: map(get_country, df['season'])
Out[7]:
['Norway',
'Norway',
'Norway',
'Norway',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Norway']
现在我们将该结果分配给名为“country”的列
In [8]: df['country'] = map(get_country, df['season'])
让我们看看最终结果:
In [9]: df
Out[9]:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
* apply()
以下是它的外观:
In [16]: df['country'] = df['season'].apply(get_country)
In [17]: df
Out[17]:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
仅限伪代码:)
# Modify this as needed
country_matchers = {
'Norway': ['Nor', 'Norv'],
'Sweden': ['S', 'Swed'],
}
def get_country(s):
"""
Run the passed string s against "matchers" for each country
Return the first matched country
"""
for country, matchers in country_matchers.items():
for matcher in matchers:
if matcher in s:
return country
答案 2 :(得分:1)
您可以使用dictionary
创建国家/地区dictionary comprehension
:
country_id = df.season.str.split().str.get(0).drop_duplicates()
country_dict = {c: ('Norway' if c.startswith('N') else 'Sweden') for c in country_id.values}
得到:
{'Nor': 'Norway', 'Swe': 'Sweden', 'Sven': 'Sweden', 'Norv': 'Norway'}
这适用于两个国家/地区,否则您可以apply
以类似的方式自我定义function
:
def country_dict(country_id):
if country_id.startswith('S'):
return 'Sweden'
elif country_id.startswith('N'):
return 'Norway'
elif country_id.startswith('XX'):
return ...
else:
return 'default'
无论哪种方式,使用map
dictionary
提取的country_id
season
到column
pandas
的{{1}}部分string
方法:
df['country'] = df.season.str.split().str.get(0).map(country_dict)
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway