根据某些条件在熊猫中填充列的逻辑

时间:2020-10-04 12:33:01

标签: python-3.x pandas dataframe

我正在寻求帮助,以找到有关如何通过应用一些条件向Pandas df添加新列的良好逻辑。

将根据某些条件创建“ O”列(最小播放):

  1. 如果不替换(替换)一名球员,他的出场时间将是“ G”列中的值(时间)
  2. 如果替换(替换),他的游戏时间将是“ N”列中的值(订阅) 并且剩余时间将添加到替换他的球员的同一列“ O”(最小球员)

示例:

    列“ J”(名称)中的
  1. 播放器“ Antonio M。”,列“ N”(替代) em> => min_played =时间(94.54 = 94.54)

  2. 列“ J”(名称)中的玩家“ Bowen J。”被“ Anderson F。”替换 “ Bowen J。” min_played = 89 << em>取自“ N”列的值(subs)> “ 安德森F。” min_played = 94.54 << em>值取自“ G”列(时间)>减去89 << em>值取自“ N”列(替换) > =>总分钟数= 5.54 并且此值应添加到 第13行“ min_played”列

    为什么要在第13行:因为他的名字在那儿<< em> 第13行“ J”(名字) >

对于每轮<< em>“ B”列(比赛类型)>,我必须执行此过程

Sample Data

# Convert End Time to float
def convert_to_float(x):
    remove_char = lambda x: x.replace(' ','').replace(':','.')
    temp_list = remove_char(x).split('+')
    return sum([float(i) for i in temp_list])

df['time'] = df['time'].apply(convert_to_float)

# Convert Sub-Out Time to Float
def min_played(x):
    try:
        min_played = x.split(" ")[0].replace("'","")
        return convert_to_float(min_played)
    except:
        pass

df['min_played'] = df['subs'].apply(min_played)

indx = 0
for x in df['status']:
    if (x == 'line-up') & (df.loc[indx, 'subs'] is np.nan) == True:
        df.loc[indx,'min_played'] = df.loc[indx, 'time']
        
    if (x != 'line-up') & (x != 'sub') == True:
        df.loc[indx,'min_played'] = 0
    indx += 1

filtr = (df['status'] == 'line-up')
df.loc[filtr, 'sub_min_played'] = df.loc[filtr, 'time'] - df.loc[filtr, 'min_played']

filtr = (df['status'] != 'line-up') & (df['status'] != 'sub')
df.loc[filtr, 'sub_min_played'] = 0

df['name'] = df['name'].apply(lambda x: x.replace(" (C)",""))

df.to_csv('q.csv')

1 个答案:

答案 0 :(得分:0)

根据使用情况,完整数据中可能需要处理的边缘情况很少。但这应该是一个好的起点。

df = pd.read_csv('SOSample.csv')
def convert_to_float(x):
    remove_char = lambda x: x.replace(' ','').replace(':','.')
    temp_list = remove_char(x).split('+')
    return sum([float(i) for i in temp_list])

df['time'] = df['time'].apply(convert_to_float)

def min_played(time,subs,status):
    if status == 'line-up':
        if isinstance(subs,str):
            t = subs.split("'")[0]
            #eval to handle cases like `(90+3)`
            # eval("90+3") = 93
            return eval(t)
        else:
            return time
    return np.nan
    
def sub_min_played(time,status,min_played):
    if time != min_played:
        return time-min_played
        
df['min_played'] = df.apply(lambda x: min_played(x.time,x.subs,x.status),axis=1)
df['sub_min_played'] = df.apply(lambda x: sub_min_played(x.time,x.status,x.min_played),axis=1)

df

输出: enter image description here