我有一个数据框,其中包含一个名为'States'
的列,其中列出了美国各州。我需要使用区域说明符'Atlantic Coast'
创建另一个列我有属于各个区域的状态列表,所以如果df['States']
中的状态匹配列表'Atlantic_states'
中的状态,则说明符'Atlantic Coast'
已插入新列df['region specifier']
,下面的代码显示了我想要将我的数据框值与df['States']
列的输出进行比较的列表。
#list of states
Atlantic_states = ['Virginia',
'Massachusetts',
'Maine',
'New York',
'Rhode Island',
'Connecticut',
'New Hampshire',
'Maryland',
'Delaware',
'New Jersey',
'North Carolina',
'South Carolina',
'Georgia',
'Florida']
print(df['States'])
Out:
States
0 Virginia
1 Massachusetts
2 Maine
3 New York
4 Rhode Island
5 Connecticut
6 New Hampshire
7 Maryland
8 Delaware
9 New Jersey
10 North Carolina
11 South Carolina
12 Georgia
13 Florida
14 Wisconsin
15 Michigan
16 Ohio
17 Pennsylvania
18 Illinois
19 Indiana
20 Minnesota
21 New York
22 Washington
23 Oregon
24 California
答案 0 :(得分:3)
虽然安迪的回答有效,但这并不是最有效的方法。有一个方便的方法可以在几乎所有 pandas类似系列的对象上调用:.isin()
。对此的条目可以是列表,dicts和pandas系列。
df = pd.DataFrame(['Virginia','Massachusetts','Maine','New York','Rhode Island',
'Connecticut','New Hampshire','Maryland', 'Delaware',
'New Jersey','North Carolina', 'South Carolina','Georgia','Florida',
'Wisconsin','Michigan', 'Ohio','Pennsylvania','Illinois',
'Indiana','Minnesota','New York','Washington','Oregon',
'California'],
columns=['States'])
Atlantic_states = ['Virginia', 'Massachusetts', 'Maine', 'New York','Rhode Island',
'Connecticut', 'New Hampshire', 'Maryland', 'Delaware',
'New Jersey', 'North Carolina', 'South Carolina', 'Georgia',
'Florida']
df['Coast'] = np.where(df['States'].isin(Atlantic_states), 'Atlantic Coast',
'Unknown')
df.head()
Out[1]:
States Coast
0 Virginia Atlantic Coast
1 Massachusetts Atlantic Coast
2 Maine Atlantic Coast
3 New York Atlantic Coast
4 Rhode Island Atlantic Coast
以下是一些用于将字母表的前10个字母映射到一些随机int
数字的时序:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(low=0, high=26, size=(1000000,1)),
columns=['numbers'])
letters = dict(zip(list(range(0, 10)), [i for i in 'abcdefghij']))
申请
%%timeit
def is_atlantic(state):
return True if state in letters else False
df.numbers.apply(is_atlantic)
Out[]: 1 loops, best of 3: 432 ms per loop
现在为JohnE建议的地图
%%timeit
df.numbers.map(letters)
Out[]: 10 loops, best of 3: 56.9 ms per loop
最后是isin(也由Nickil Maveli建议)
%%timeit
df.numbers.isin(letters)
Out[]: 10 loops, best of 3: 20.9 ms per loop
因此,我们发现.isin()
比.apply()
快得多,速度是.map()
的两倍。
注意:apply
和isin
只返回布尔掩码和map
填充所需的字符串。即便如此,当分配到另一列isin
时,map
的时间约为2/3。
答案 1 :(得分:2)
你有几个选择。首先,直接回答提出的问题:
创建一个函数,返回状态是否在大西洋地区
def is_atlantic(state):
return "Atlantic" if state in Atlantic_states else "Unknown"
现在,您使用.apply()
并获取结果(并将其返回到新列)
df['Region'] = df['State'].apply(is_atlantic)
这将返回如下所示的数据框:
State Region
0 Virginia Atlantic
1 Massachusetts Atlantic
2 Maine Atlantic
3 New York Atlantic
4 Rhode Island Atlantic
5 Connecticut Atlantic
6 New Hampshire Atlantic
7 Maryland Atlantic
8 Delaware Atlantic
9 New Jersey Atlantic
10 North Carolina Atlantic
11 South Carolina Atlantic
12 Georgia Atlantic
13 Florida Atlantic
14 Wisconsin Unknown
15 Michigan Unknown
16 Ohio Unknown
17 Pennsylvania Unknown
18 Illinois Unknown
19 Indiana Unknown
20 Minnesota Unknown
21 New York Atlantic
22 Washington Unknown
23 Oregon Unknown
24 California Unknown
如果您要检查多个列表,则第一个选项会变得很麻烦。我建议创建一个以State为键,区域为值的字典,而不是多个列表。只有50个值,这应该很容易维护。
regions = {
'Virginia': 'Atlantic',
'Massachusetts': 'Atlantic',
'Maine': 'Atlantic',
'New York': 'Atlantic',
'Rhode Island': 'Atlantic',
'Connecticut': 'Atlantic',
'New Hampshire': 'Atlantic',
'Maryland': 'Atlantic',
'Delaware': 'Atlantic',
'New Jersey': 'Atlantic',
'North Carolina': 'Atlantic',
'South Carolina': 'Atlantic',
'Georgia': 'Atlantic',
'Florida': 'Atlantic',
'Wisconsin': 'Midwest',
'Michigan': 'Midwest',
'Ohio': 'Midwest',
'Pennsylvania': 'Midwest',
'Illinois': 'Midwest',
'Indiana': 'Midwest',
'Minnesota': 'Midwest',
'New York': 'Atlantic',
'Washington': 'West',
'Oregon': 'West',
'California': 'West'
}
您可以再次使用.apply()
,稍加修改一下功能:
def get_region(state):
return regions[state]
df['Region'] = df['State'].apply(get_region)
这次您的数据框如下所示:
State Region
0 Virginia Atlantic
1 Massachusetts Atlantic
2 Maine Atlantic
3 New York Atlantic
4 Rhode Island Atlantic
5 Connecticut Atlantic
6 New Hampshire Atlantic
7 Maryland Atlantic
8 Delaware Atlantic
9 New Jersey Atlantic
10 North Carolina Atlantic
11 South Carolina Atlantic
12 Georgia Atlantic
13 Florida Atlantic
14 Wisconsin Midwest
15 Michigan Midwest
16 Ohio Midwest
17 Pennsylvania Midwest
18 Illinois Midwest
19 Indiana Midwest
20 Minnesota Midwest
21 New York Atlantic
22 Washington West
23 Oregon West
24 California West