基于字符串创建新列

时间:2019-01-10 18:46:33

标签: python string pandas numpy substring

我有一个数据框,想基于column1_sport中的字符串创建一列。

TextC

数据包含:

1       Glen 24
2       Jos 32
3       Jasmien 25

我想查找某些字符串(“ ball”或“ box”),然后根据该列是否包含该单词来创建一个新列。如果数据框不包含该单词,请添加“其他”。见下文。

import pandas as pd

df = pd.read_csv('C:/Users/test/dataframe.csv', encoding  = 'iso-8859-1')

4 个答案:

答案 0 :(得分:1)

对于多种情况,我建议使用np.select。例如:

values = ['ball', 'box']
conditions = list(map(df['column1_sport'].str.contains, values))

df['column2_type'] = np.select(conditions, values, 'other')

print(df)

#   column1_sport column2_type
# 0      baseball         ball
# 1    basketball         ball
# 2        tennis        other
# 3        boxing          box
# 4          golf        other

答案 1 :(得分:0)

您可以使用嵌套的np.where

cond1 = df.column1_sport.str.contains('ball')
cond2 = df.column1_sport.str.contains('box')
df['column2_type'] = np.where(cond1, 'ball', np.where(cond2, 'box', 'other') )

    column1_sport   column2_type
0   baseball        ball
1   basketball      ball
2   tennis          other
3   boxing          box
4   golf            other

答案 2 :(得分:0)

df["column2_type"] = df.column1_sport.apply(lambda x: "ball" if "ball" in x else ("box" if "box" in x else "Other"))
df

    column1_sport   column2_type
0        baseball           ball
1      basketball           ball
2          tennis          Other
3          boxing            box
4            golf          Other

如果您有更复杂的条件

def func(a):
    if "ball" in a.lower():
        return "ball"
    elif "box" in a.lower():
        return "box"
    else:
        return "Other"

df["column2_type"] = df.column1_sport.apply(lambda x: func(x))

答案 3 :(得分:0)

对于这种简单情况,您可以创建一个自定义词典并将其用于map系列df.column1_sport

d = {'basketball':'ball', 'boxing':'box', 'baseball':'ball'}
df['column2_type'] = df.column1_sport.map(d).fillna('other') 

    column1_sport column2_type
0      baseball         ball
1    basketball         ball
2        tennis        other
3        boxing          box
4          golf        other