我们如何添加具有相同ID的数据框?

时间:2019-04-26 03:12:01

标签: python-3.x pandas

我是数据科学学习的初学者。过去了熊猫主题,我在这里找到了一个任务,我无法理解这是什么错误。让我解释一下这个问题。

我有三个数据框:

gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
                         'Medals': [15, 13, 9]}
                    )
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
                        'Medals': [29, 20, 16]}
                    )
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
                        'Medals': [40, 28, 27]}
                    )

在这里,我需要将所有奖牌添加到一个栏中,将国家/地区添加到另一栏中。当我添加时,它显示的是NAN。因此,我用零值填充了NAN,但仍然无法获得应有的输出。

代码:

    gold.set_index('Country', inplace = True)
    silver.set_index('Country',inplace = True)
    bronze.set_index('Country', inplace = True)
    Total = silver.add(gold,fill_value = 0)
    Total = bronze.add(silver,fill_value = 0)
    Total = gold + silver + bronze
    print(Total)

实际输出:

                Medals
    Country        
     France      NaN
     Germany     NaN
     Russia      NaN
     UK          NaN
     USA        72.0

预期:

               Medals
     Country        
     USA        72.0
     France     53.0
     UK         27.0
     Russia     25.0
     Germany    20.0

让我知道怎么了。

2 个答案:

答案 0 :(得分:2)

只需对concat groupbysum

pd.concat([gold,silver,bronze]).groupby('Country').sum()
Out[1306]: 
         Medals
Country        
France       53
Germany      20
Russia       25
UK           27
USA          72

修正代码

silver.add(gold,fill_value = 0).add(bronze,fill_value=0)

答案 1 :(得分:0)

# For a video solution of the code, copy-paste the following link on your browser:
# https://youtu.be/p0cnApQDotA

import numpy as np 
import pandas as pd

# Defining the three dataframes indicating the gold, silver, and bronze medal counts
# of different countries
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
                         'Medals': [15, 13, 9]}
                    )
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
                        'Medals': [29, 20, 16]}
                    )
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
                        'Medals': [40, 28, 27]}
                    )

# Set the index of the dataframes to 'Country' so that you can get the countrywise
# medal count
gold.set_index('Country', inplace = True)
silver.set_index('Country', inplace = True) 
bronze.set_index('Country', inplace = True) 

# Add the three dataframes and set the fill_value argument to zero to avoid getting
# NaN values
total = gold.add(silver, fill_value = 0).add(bronze, fill_value = 0)

# Sort the resultant dataframe in a descending order
total = total.sort_values(by = 'Medals', ascending = False)

# Print the sorted dataframe
print(total)