Question

我有一个CSV文件，其中包含有关17/18赛季英超联赛所有足球比赛的数据。我想编写一个循环，将这些数据拆分为测试数据集和训练数据集。第一个测试数据集将包含前10轮比赛的所有数据。下一个测试数据集将包括来自前11轮比赛的所有数据，依此类推。基本上，测试数据集将随着每一轮比赛与最后一场比赛的比赛而增长。总共有38场比赛。

CSV文件如下所示：

我写了以下代码：

import pandas as pd

def build_temp_model(dataset, match_round):
   test_dataset = dataset[dataset['Round'] <= match_round]
   if len(test_dataset) == 0:
      return 0
   file_name = str(match_round) + '.csv'
   train_dataset.to_csv(file_name, index=None)

EPL = pd.DataFrame()
EPL = pd.read_csv('/Users/HJA/Desktop/Betting/understatV0.01/test.csv')
EPL = EPL.sort_values(by='Round')

if __name__ == '__main__':
    get_total_score = [build_temp_model(EPL, round) for rounds in range(11, 39, 1)]

但是，我在以下一行中出现错误：

test_dataset = dataset[dataset['Round'] <= match_round]

错误提示： TypeError：'int'和'builtin_function_or_method'实例之间不支持'<='

有人可以解释我在做什么错吗？预先感谢

Answer 1

出现错字。 Pylint会有所帮助。

get_total_score = [build_temp_model(EPL, round) for rounds in range(11, 39, 1)]

round()是一个内置函数。 rounds是您要迭代的变量。

循环创建新数据框

1 个答案: