在pandas数据框字符串列中,我想基于一行的值派生一个新列,直到下一个值再次出现。什么是最有效的方法?
输入数据框:
import pandas as pd
df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})
我所需的数据框输出为:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
答案 0 :(得分:3)
使用.str.extract
+ ffill
df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()
答案 1 :(得分:0)
您可以map
自定义定义的功能,其功能符合预期
city = None
def generate(s):
global city
if 'City' in s: city = s.replace('City','')
return city
df['neighborhood'].map(generate)
这将返回预期的输出