熊猫-根据现有列值填充新列

时间:2020-06-19 21:04:59

标签: python pandas

我有以下数据框df_shots

              TableIndex  MatchID  GameWeek           Player  ...      ShotPosition    ShotSide      Close             Position
ShotsDetailID                                                 ...                                                              
6                      5    46605         1  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
8                      7    46605         1  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
10                     9    46605         1  Roberto Firmino  ...           the box    the left  not close      the boxthe left
17                    16    46605         1  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
447                  446    46623         2  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
...                  ...      ...       ...              ...  ...               ...         ...        ...                  ...
6656                6662    46870        27  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
6666                6672    46870        27  Roberto Firmino  ...           the box   the right  not close     the boxthe right
6674                6680    46870        27  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
6676                6682    46870        27  Roberto Firmino  ...           the box    the left  not close      the boxthe left
6679                6685    46870        27  Roberto Firmino  ...   outside the box         N/A  not close   outside the boxN/A

为清楚起见,所有可能的“位置”值均为:

positions = ['a difficult anglethe left',
             'a difficult anglethe right',
             'long rangeN/A',
             'long rangethe centre',
             'long rangethe left',
             'long rangethe right',
             'outside the boxN/A',
             'penaltyN/A',
             'the boxthe centre',
             'the boxthe left',
             'the boxthe right',
             'the six yard boxthe left',
             'the six yard boxthe right',
             'very close rangeN/A']

现在,我将以下x / y值映射到每个“位置”名称,并将该值存储在新的“位置XY”列下:

    the_boxthe_center = {'y':random.randrange(25,45), 'x':random.randrange(0,6)}
    the_boxthe_left = {'y':random.randrange(41,54), 'x':random.randrange(0,16)}
    the_boxthe_right = {'y':random.randrange(14,22), 'x':random.randrange(0,16)}
    very_close_rangeNA = {'y':random.randrange(25,43), 'x':random.randrange(0,4)}
    six_yard_boxthe_left = {'y':random.randrange(33,43), 'x':random.randrange(4,6)}
    six_yard_boxthe_right = {'y':random.randrange(25,33), 'x':random.randrange(4,6)}
    a_diffcult_anglethe_left = {'y':random.randrange(43,54), 'x':random.randrange(0,6)}
    a_diffcult_anglethe_right = {'y':random.randrange(14,25), 'x':random.randrange(0,6)}
    penaltyNA = {'y':random.randrange(36), 'x':random.randrange(8)}
    outside_the_boxNA = {'y':random.randrange(14,54), 'x':random.randrange(16,28)}
    long_rangeNA = {'y':random.randrange(0,68), 'x':random.randrange(40,52)}
    long_rangethe_centre = {'y':random.randrange(0,68), 'x':random.randrange(28,40)}
    long_rangethe_right = {'y':random.randrange(0,14), 'x':random.randrange(0,24)}
    long_rangethe_left = {'y':random.randrange(54,68), 'x':random.randrange(0,24)}

我尝试过:

if df_shots['Position']=='very close rangeN/A':
        df_shots['Position X/Y']==very_close_rangeNA
...# and so on

但是我得到了

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该怎么做?

3 个答案:

答案 0 :(得分:1)

在容器外部存储如此多的相关变量是一种不好的形式,请使用我们映射到您的数据框的字典。

data_dict = 
{'the boxthe centre': {'y':random.randrange(25,45)...}


df['Position'] = df['Position'].map(data_dict)

print(df['Position'])
6        {'y': 35, 'x': 2}
8        {'y': 32, 'x': 1}
10      {'y': 44, 'x': 11}
17       {'y': 32, 'x': 1}
447      {'y': 32, 'x': 1}
...                    NaN
6656     {'y': 35, 'x': 2}
6666    {'y': 15, 'x': 11}
6674     {'y': 32, 'x': 1}
6676    {'y': 44, 'x': 11}
6679    {'y': 37, 'x': 16}
Name: Position, dtype: object

答案 1 :(得分:0)

下面的代码可能会实现所需的技巧。

首先创建所有“位置XY”之类的列表

requests

以及对应的position_xy = [the_boxthe_center,the_boxthe_left,....,long_rangethe_left] #and so on... 列表(如您所愿) 那么我建议你做一个字典,以便每个位置都进行对应的位置xy计算

positions

然后在数据框中创建一个新列,您要根据位置在其中存储x,y值

dict_positionxy = dict(zip(position, position_xy))

现在您可以一一遍遍所有行

 df_shots['Position X/Y'] = 0.

这应该可以解决问题:)

答案 2 :(得分:0)

这是一些示例代码,可以满足您的要求。我创建了df_shots的基本模型,但是应该在较大的DataFrame上运行相同的模型。我还将其中一些自由变量存储在dict中,以简化过滤。

应该注意的是,由于您预先计算了positions_xy的随机值,因此每个射击位置的所有x / y值都相同。这可能不是您想要的。

import pandas as pd
import random

# Sample df_shots
df_shots = pd.DataFrame({'Position': ['the_boxthe_center', 'the_boxthe_left']})

# Store position/xy pairs in dict
positions_xy = {'the_boxthe_center': {'y': random.randrange(25, 45), 'x': random.randrange(0, 6)},
                'the_boxthe_left': {'y': random.randrange(41, 54), 'x': random.randrange(0, 16)}}

# Create new column
df_shots['Position XY'] = ''

# Iterate over all position/xy pairs
for position, xy in positions_xy.items():
    # Determine indices of all players that match
    matches = df_shots['Position'] == position
    matches_indices = matches[matches].index
    # Update matching rows in df_shots with xy
    for idx in matches_indices:
        df_shots.at[idx, 'Position XY'] = xy

print(df_shots)

输出:

            Position        Position XY
0  the_boxthe_center  {'y': 36, 'x': 2}
1    the_boxthe_left  {'y': 44, 'x': 0}