熊猫追加正在添加没有索引号的新行

时间:2020-04-06 14:37:21

标签: python pandas csv dataframe

这就是我要用我的代码完成的工作:我有一个包含网球运动员名称的当前csv文件,一旦他们在排名中显示,我想向其中添加新的运动员。我的脚本通过排名并创建一个数组,然后从csv文件导入名称。应该查看哪些名称不在后者中,然后提取这些名称的在线数据。然后,我只希望将新行添加到该旧CSV文件的末尾。我的问题是,新行将使用播放器的名称进行索引,而不是遵循旧文件的索引。任何想法为什么会这样?还有为什么要添加一个未命名的列?


def get_all_players():

    # imports names of players currently in the atp rankings
    current_atp_ranking = check_atp_rankings()
    current_player_list = current_atp_ranking['Player']

    # clean up names in case of white spaces
    for i in range(0, len(current_player_list)):
        current_player_list[i] = current_player_list[i].strip()

    # reads the main file and makes a dataframe out of it
    current_file = 'ATP_stats_new.csv'
    df = pd.read_csv(current_file)

    # gets all the names within the main file to see which current ones aren't there
    names_on_file = list(df['Player'])
    # cleans up in case of any white spaces
    for i in range(0, len(names_on_file)):
        names_on_file[i] = names_on_file[i].strip()

    # Removing Nadal for testing purposes
    names_on_file.remove("Rafael Nadal")

    # creating a list of players in current_players_list but not in names_on_file
    new_player_list = [x for x in current_player_list if x not in names_on_file]

    # loop through new_player_list
    for player in new_player_list:

        # delay to avoid stopping
        time.sleep(2)

        # finding the player's atp link for profile based on their name
        atp_link = current_atp_ranking.loc[current_atp_ranking['Player'] == player, 'ATP_Link']
        atp_link = atp_link.iloc[0]

        # make a basic dictionary with just the player's name and link
        player_dict = [{'Name': player, 'ATP_Link': atp_link}]

        # enter the new dictionary into the existing main file
        df.append(player_dict, ignore_index=True)

    # print dataframe to see how it looks before exporting
    print(df)

    # export dataframe into current file
    df.to_csv(current_file)

文件最初是这样的:

      Unnamed: 0            Player  ...                         Coach Turned_Pro
0              0    Novak Djokovic  ...                           NaN        NaN
1              1      Rafael Nadal  ...   Carlos Moya, Francisco Roig     2001.0
2              2     Roger Federer  ...  Ivan Ljubicic, Severin Luthi     1998.0
3              3   Daniil Medvedev  ...                           NaN        NaN
4              4     Dominic Thiem  ...                           NaN        NaN
...          ...               ...  ...                           ...        ...
1976        1976      Brian Bencic  ...                           NaN        NaN
1977        1977  Boruch Skierkier  ...                           NaN        NaN
1978        1978      Majed Kilani  ...                           NaN        NaN
1979        1979   Quentin Gueydan  ...                           NaN        NaN
1980        1980     Preston Brown  ...                           NaN        NaN

这是新行的样子:

1977              1977.0  ...        NaN
1978              1978.0  ...        NaN
1979              1979.0  ...        NaN
1980              1980.0  ...        NaN
Rafael Nadal         NaN  ...       2001

1 个答案:

答案 0 :(得分:0)

您的代码中缺少一些关键部分,这些部分对于准确回答问题是必不可少的。根据您发布的内容有两种想法:

导入CSV文件

您以前的csv文件可能与索引一起保存。确保csv文件的内容在上一个csv列中上次使用时没有数据框索引。保存时,请执行以下操作:

file.to_csv('file.csv', index=False)

当您像这样加载文件时

pandas.read_csv('file.csv')

它将自动分配索引号,并且不会出现重复的列。

列的排序错误

不确定atp_link按什么顺序输入什么信息。从您提供的信息看来,它返回的是两列:“教练”和“车削专业”。

我建议您在从atp_link中提取信息后,为要添加的每个新玩家创建一个列表(而不是字典)。因此,如果您要添加纳达尔,则可以根据信息为每个新玩家创建一个信息列表。纳达尔的信息列表如下所示:

info_list = ['Rafael Nadal', '','2001']

然后将列表像这样添加到数据框:

df.loc[len(df),:] = info_list

希望这会有所帮助。