这就是我要用我的代码完成的工作:我有一个包含网球运动员名称的当前csv文件,一旦他们在排名中显示,我想向其中添加新的运动员。我的脚本通过排名并创建一个数组,然后从csv文件导入名称。应该查看哪些名称不在后者中,然后提取这些名称的在线数据。然后,我只希望将新行添加到该旧CSV文件的末尾。我的问题是,新行将使用播放器的名称进行索引,而不是遵循旧文件的索引。任何想法为什么会这样?还有为什么要添加一个未命名的列?
def get_all_players():
# imports names of players currently in the atp rankings
current_atp_ranking = check_atp_rankings()
current_player_list = current_atp_ranking['Player']
# clean up names in case of white spaces
for i in range(0, len(current_player_list)):
current_player_list[i] = current_player_list[i].strip()
# reads the main file and makes a dataframe out of it
current_file = 'ATP_stats_new.csv'
df = pd.read_csv(current_file)
# gets all the names within the main file to see which current ones aren't there
names_on_file = list(df['Player'])
# cleans up in case of any white spaces
for i in range(0, len(names_on_file)):
names_on_file[i] = names_on_file[i].strip()
# Removing Nadal for testing purposes
names_on_file.remove("Rafael Nadal")
# creating a list of players in current_players_list but not in names_on_file
new_player_list = [x for x in current_player_list if x not in names_on_file]
# loop through new_player_list
for player in new_player_list:
# delay to avoid stopping
time.sleep(2)
# finding the player's atp link for profile based on their name
atp_link = current_atp_ranking.loc[current_atp_ranking['Player'] == player, 'ATP_Link']
atp_link = atp_link.iloc[0]
# make a basic dictionary with just the player's name and link
player_dict = [{'Name': player, 'ATP_Link': atp_link}]
# enter the new dictionary into the existing main file
df.append(player_dict, ignore_index=True)
# print dataframe to see how it looks before exporting
print(df)
# export dataframe into current file
df.to_csv(current_file)
文件最初是这样的:
Unnamed: 0 Player ... Coach Turned_Pro
0 0 Novak Djokovic ... NaN NaN
1 1 Rafael Nadal ... Carlos Moya, Francisco Roig 2001.0
2 2 Roger Federer ... Ivan Ljubicic, Severin Luthi 1998.0
3 3 Daniil Medvedev ... NaN NaN
4 4 Dominic Thiem ... NaN NaN
... ... ... ... ... ...
1976 1976 Brian Bencic ... NaN NaN
1977 1977 Boruch Skierkier ... NaN NaN
1978 1978 Majed Kilani ... NaN NaN
1979 1979 Quentin Gueydan ... NaN NaN
1980 1980 Preston Brown ... NaN NaN
这是新行的样子:
1977 1977.0 ... NaN
1978 1978.0 ... NaN
1979 1979.0 ... NaN
1980 1980.0 ... NaN
Rafael Nadal NaN ... 2001
答案 0 :(得分:0)
您的代码中缺少一些关键部分,这些部分对于准确回答问题是必不可少的。根据您发布的内容有两种想法:
导入CSV文件
您以前的csv文件可能与索引一起保存。确保csv文件的内容在上一个csv列中上次使用时没有数据框索引。保存时,请执行以下操作:
file.to_csv('file.csv', index=False)
当您像这样加载文件时
pandas.read_csv('file.csv')
它将自动分配索引号,并且不会出现重复的列。
列的排序错误
不确定atp_link
按什么顺序输入什么信息。从您提供的信息看来,它返回的是两列:“教练”和“车削专业”。
我建议您在从atp_link
中提取信息后,为要添加的每个新玩家创建一个列表(而不是字典)。因此,如果您要添加纳达尔,则可以根据信息为每个新玩家创建一个信息列表。纳达尔的信息列表如下所示:
info_list = ['Rafael Nadal', '','2001']
然后将列表像这样添加到数据框:
df.loc[len(df),:] = info_list
希望这会有所帮助。