熊猫在列中的成对差异

时间:2018-11-05 04:32:53

标签: pandas numpy

我有一个包含3列的数据集-DriverID,Race,Place。驱动程序ID

 DriverID Race  Place
    83    1      1
    18    1      2
    20    1      3
    48    1      4
    53    1      5

对于每场比赛,我想计算一个{strong> Place的{​​{1}}列中的成对差异的矩阵(numpy数组)。问题在于,并非所有DriverID和种族都被代表。因此,我决定首先为DriverIDDriverID的每个组合创建一个完整的交叉联接表,如下所示(下面的可复制示例):

Race

现在要获取成对差异,我按如下进行操作(使用here中的方法:

url = "http://personal.psu.edu/drh20/code/btmatlab/nascar2002.txt"
races_trimmed = pd.read_table(url, sep=" ")

# Create a cartesian product of unique drivers and races to get every combination
unq_drivers = sorted(races_trimmed["DriverID"].unique())
unq_drivers = [x for x in unq_drivers if str(x) != 'nan']
unq_races = sorted(races_trimmed["Race"].unique())
unq_races = [x for x in unq_races if str(x) != 'nan']

# Get a dataframe 
unq_drivers_df = pd.DataFrame(unq_drivers, columns=["DriverID"])
unq_races_df = pd.DataFrame(unq_races, columns=["Race"])

# Let's cross join the columns to get all unique combinations of drivers and races
all_driver_race_combs = unq_drivers_df.assign(foo=1).merge(unq_races_df.assign(foo=1)).drop('foo', 1)
all_driver_race_combs = all_driver_race_combs.sort_values(by=['Race', 'DriverID'])
all_driver_race_mg = pd.merge(all_driver_race_combs, races_trimmed,  how='left', 
                              left_on=['DriverID','Race'], right_on = ['DriverID','Race'])

您会看到,它输出# Now let's do a pairwise difference in finish across drivers for a # single race # based on https://stackoverflow.com/questions/46266633/pandas-creating-difference-matrix-from-data-frame race_num = 2.0 race_res = all_driver_race_mg[all_driver_race_mg["Race"] == race_num] race_res = race_res.sort_values(by=['DriverID']) arr = (race_res['Place'].values - race_res['Place'].values[:, None]) new_race_1 = pd.concat((race_res['DriverID'], pd.DataFrame(arr, columns=race_res['DriverID'])), axis=1) # Remove the first column - it has the DriverID in the pairwise matrix new_race_1 = new_race_1.values[:, 1:] new_race_1.shape 数组,而不是(166, 83)的{​​{1}}数组。对于(83, 83),它起作用,但对于所有其他种族,它不起作用。谁能解释如何校正计算,即为每个有效的race_num = 2.0输出race_num = 1.0矩阵?我认为这是83 * 83值,但不确定如何解决?

0 个答案:

没有答案
相关问题