从pandas中的列中删除数组的内容

时间:2018-03-15 13:18:24

标签: python pandas dataframe jupyter-notebook

我正在尝试删除数组的内容 来自我的数据框中名称列的positions = ['CF','ST','RW','LW','CB','RB','LB','CM','CAM','CDM','RM','LM','RWB','LWB'] 与足球运动员。以下数据框的示例。 Player dataframe

任何人都可以帮我删除这些字符串,我已经尝试过str.replace但它不起作用,

由于

4 个答案:

答案 0 :(得分:1)

以下方法特别匹配以空格分隔的值。

df = pd.DataFrame({'Player': ['ABC CF ST RW', 'DEF LB CM', 'GHI RM', 'JKL']})

rem = ['CF','ST','RW','LW','CB','RB','LB',
       'CM','CAM','CDM','RM','LM','RWB','LWB']

rem_set = set(rem)

def remover(p):
    return ' '.join([x for x in p.split() if x not in rem_set])

df['Player'] = df['Player'].map(remover)

#   Player
# 0    ABC
# 1    DEF
# 2    GHI
# 3    JKL

效果基准

df = pd.DataFrame({'Player': ['ABC CF ST RW', 'DEF LB CM', 'GHI RM', 'JKL']})

rem = ['CF','ST','RW','LW','CB','RB','LB',
       'CM','CAM','CDM','RM','LM','RWB','LWB']

rem_set = set(rem)

df = pd.concat([df]*20000)

def jez(df):
    d = {r'(\b){}(\b)'.format(x):r'' for x in rem_set}
    df['Player'] = df['Player'].replace(d, regex=True)
    return df

def jp(df):
    def remover(p):
        return ' '.join([x for x in p.split() if x not in rem_set])

    df['Player'] = df['Player'].map(remover)
    return df

%timeit jez(df)  # 1.24s
%timeit jp(df)   # 86ms

答案 1 :(得分:0)

我认为如果有必要,请删除最后一个空格后的所有字符串:

df['Name'] = df['Name'].str.rsplit(n=1).str[0]

或者如果需要仅使用positions删除值(使用jpp DataFrame):

d = {r'\s+(\b){}(\b)'.format(x):r'' for x in positions}
df['Name'] = df['Name'].replace(d, regex=True)
print (df)
  Name
0  ABC
1  DEF
2  GHI
3  JKL

答案 2 :(得分:0)

您可能会发现在结尾处删除所有2或3个字符的大写条目就足够了,如下所示:

import pandas as pd

data = [
    ['Name', 'Overall', 'Club'], 
    ['L. Messi CF ST RW', 94, 'FC Barcelona'],
    ['Cristiano Ronaldo LW LM ST RM', 92, 'Real Madrid CF']]

df = pd.DataFrame(data[1:], columns=data[0])    
df['Name'] = df['Name'].replace(r'((\s+[A-Z]{2,3}))+$', '', regex=True)

print(df)

这会给你:

                Name  Overall            Club
0           L. Messi       94    FC Barcelona
1  Cristiano Ronaldo       92  Real Madrid CF

答案 3 :(得分:-1)

df = pd.DataFrame({"Name": ["James kon CF ST RW", "Rom CAM"], "Overall": [23,65], "Club": ["a", "b"]})

positions = set(['CF','ST','RW','LW','CB','RB','LB','CM','CAM','CDM','RM','LM','RWB','LWB'])

def f(name, position):
    item = set(name.split(" "))
    newobj = item - position
    return " ".join(newobj)

df["Name"].map(lambda x: f(x, positions))