我有一个数据框,想基于column1_sport中的字符串创建一列。
TextC
数据包含:
1 Glen 24
2 Jos 32
3 Jasmien 25
我想查找某些字符串(“ ball”或“ box”),然后根据该列是否包含该单词来创建一个新列。如果数据框不包含该单词,请添加“其他”。见下文。
import pandas as pd
df = pd.read_csv('C:/Users/test/dataframe.csv', encoding = 'iso-8859-1')
答案 0 :(得分:1)
对于多种情况,我建议使用np.select
。例如:
values = ['ball', 'box']
conditions = list(map(df['column1_sport'].str.contains, values))
df['column2_type'] = np.select(conditions, values, 'other')
print(df)
# column1_sport column2_type
# 0 baseball ball
# 1 basketball ball
# 2 tennis other
# 3 boxing box
# 4 golf other
答案 1 :(得分:0)
您可以使用嵌套的np.where
cond1 = df.column1_sport.str.contains('ball')
cond2 = df.column1_sport.str.contains('box')
df['column2_type'] = np.where(cond1, 'ball', np.where(cond2, 'box', 'other') )
column1_sport column2_type
0 baseball ball
1 basketball ball
2 tennis other
3 boxing box
4 golf other
答案 2 :(得分:0)
df["column2_type"] = df.column1_sport.apply(lambda x: "ball" if "ball" in x else ("box" if "box" in x else "Other"))
df
column1_sport column2_type
0 baseball ball
1 basketball ball
2 tennis Other
3 boxing box
4 golf Other
如果您有更复杂的条件
def func(a):
if "ball" in a.lower():
return "ball"
elif "box" in a.lower():
return "box"
else:
return "Other"
df["column2_type"] = df.column1_sport.apply(lambda x: func(x))
答案 3 :(得分:0)
对于这种简单情况,您可以创建一个自定义词典并将其用于map
系列df.column1_sport
:
d = {'basketball':'ball', 'boxing':'box', 'baseball':'ball'}
df['column2_type'] = df.column1_sport.map(d).fillna('other')
column1_sport column2_type
0 baseball ball
1 basketball ball
2 tennis other
3 boxing box
4 golf other