我正在尝试删除数组的内容
来自我的数据框中名称列的positions = ['CF','ST','RW','LW','CB','RB','LB','CM','CAM','CDM','RM','LM','RWB','LWB']
与足球运动员。以下数据框的示例。
Player dataframe
任何人都可以帮我删除这些字符串,我已经尝试过str.replace但它不起作用,
由于
答案 0 :(得分:1)
以下方法特别匹配以空格分隔的值。
df = pd.DataFrame({'Player': ['ABC CF ST RW', 'DEF LB CM', 'GHI RM', 'JKL']})
rem = ['CF','ST','RW','LW','CB','RB','LB',
'CM','CAM','CDM','RM','LM','RWB','LWB']
rem_set = set(rem)
def remover(p):
return ' '.join([x for x in p.split() if x not in rem_set])
df['Player'] = df['Player'].map(remover)
# Player
# 0 ABC
# 1 DEF
# 2 GHI
# 3 JKL
效果基准
df = pd.DataFrame({'Player': ['ABC CF ST RW', 'DEF LB CM', 'GHI RM', 'JKL']})
rem = ['CF','ST','RW','LW','CB','RB','LB',
'CM','CAM','CDM','RM','LM','RWB','LWB']
rem_set = set(rem)
df = pd.concat([df]*20000)
def jez(df):
d = {r'(\b){}(\b)'.format(x):r'' for x in rem_set}
df['Player'] = df['Player'].replace(d, regex=True)
return df
def jp(df):
def remover(p):
return ' '.join([x for x in p.split() if x not in rem_set])
df['Player'] = df['Player'].map(remover)
return df
%timeit jez(df) # 1.24s
%timeit jp(df) # 86ms
答案 1 :(得分:0)
我认为如果有必要,请删除最后一个空格后的所有字符串:
df['Name'] = df['Name'].str.rsplit(n=1).str[0]
或者如果需要仅使用positions
删除值(使用jpp DataFrame):
d = {r'\s+(\b){}(\b)'.format(x):r'' for x in positions}
df['Name'] = df['Name'].replace(d, regex=True)
print (df)
Name
0 ABC
1 DEF
2 GHI
3 JKL
答案 2 :(得分:0)
您可能会发现在结尾处删除所有2或3个字符的大写条目就足够了,如下所示:
import pandas as pd
data = [
['Name', 'Overall', 'Club'],
['L. Messi CF ST RW', 94, 'FC Barcelona'],
['Cristiano Ronaldo LW LM ST RM', 92, 'Real Madrid CF']]
df = pd.DataFrame(data[1:], columns=data[0])
df['Name'] = df['Name'].replace(r'((\s+[A-Z]{2,3}))+$', '', regex=True)
print(df)
这会给你:
Name Overall Club
0 L. Messi 94 FC Barcelona
1 Cristiano Ronaldo 92 Real Madrid CF
答案 3 :(得分:-1)
df = pd.DataFrame({"Name": ["James kon CF ST RW", "Rom CAM"], "Overall": [23,65], "Club": ["a", "b"]})
positions = set(['CF','ST','RW','LW','CB','RB','LB','CM','CAM','CDM','RM','LM','RWB','LWB'])
def f(name, position):
item = set(name.split(" "))
newobj = item - position
return " ".join(newobj)
df["Name"].map(lambda x: f(x, positions))