根据某行的某个值派生一个新的pandas列,并应用直到下一个值再次出现

时间:2019-03-12 00:36:25

标签: python pandas

在pandas数据框字符串列中,我想基于一行的值派生一个新列,直到下一个值再次出现。什么是最有效的方法?

输入数据框:

import pandas as pd

df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})

我所需的数据框输出为:

      neighborhood city
0     Chicago City Chicago
1      Wicker Park Chicago
2         Bucktown Chicago
3     Lincoln Park Chicago
4        West Loop Chicago
5      River North Chicago
6   Milwaukee City Milwaukee
7         Bay View Milwaukee
8        East Side Milwaukee
9       South Side Milwaukee
10     Bronzeville Milwaukee
11      North Side Milwaukee
12   New York City New York
13          Harlem New York
14         Midtown New York
15       Chinatown New York

2 个答案:

答案 0 :(得分:3)

使用.str.extract + ffill

df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()

答案 1 :(得分:0)

您可以map自定义定义的功能,其功能符合预期

city = None
def generate(s):
    global city
    if 'City' in s: city = s.replace('City','')
    return city

df['neighborhood'].map(generate)

这将返回预期的输出