插入到数据框的数字数据变为NaN

时间:2015-06-23 08:00:23

标签: python pandas dataframe nan

简化情况:

我有一个包含某些国家/地区列表的文件,我将其加载到dataframe df。 然后,我在许多.xls文件中获得了有关这些国家(以及更多国家/地区)的数据。我尝试将每个文件读取到df_f,对我感兴趣的数据进行子集化,然后从原始文件中查找国家/地区,如果存在任何文件,则将数据复制到dataframe df。

问题是只有一些值被正确分配。其中大多数是以NaNs形式插入的。 (见下文)

for filename in os.listdir(os.getcwd()):
    df_f = pd.read_excel(filename, sheetname = 'Data', parse_cols = "D,F,H,J:BS", skiprows = 2, skip_footer = 2)
    df_f = df_f.fillna(0)

    df_ss = [SUBSETTING df_f here]

    countries = df_ss['Country']

    for c in countries:
        if (c in df['Country'].values):
            row_idx = df[df['Country'] == c].index

            df_h = df_ss[quarters][df_ss.Country == c]
            df.loc[row_idx, quarters] = df_h

我得到的结果是:

Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  Q2 2001  Q3 2001  \
0     Albania      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1     Algeria      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2   Argentina      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
3     Armenia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
4   Australia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
5     Austria  4547431  5155839  5558963  6079089  6326217  6483130  6547780   
6  Azerbaijan      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
etc...

加载和子集正确完成,数据没有损坏 - 我为每次迭代打印df_h并显示常规数字。关键是,在将它们分配给df数据帧后,它们将成为NaN ...

有什么想法吗?

编辑:样本数据

df:

          Country Country group  Population  Development coefficient  Q1 2000  \
0     Albania      group II     2981000                       -1        0   
1     Algeria       group I    39106000                       -1        0   
2   Argentina     group III    42669000                       -1        0   
3     Armenia      group II     3013000                       -1        0   
4   Australia      group IV    23520000                       -1        0   
5     Austria      group IV     8531000                       -1        0   
6  Azerbaijan      group II     9538000                       -1        0   
7  Bangladesh       group I   158513000                       -1        0   
8     Belarus     group III     9470000                       -1        0   
9     Belgium     group III    11200000                       -1        0   

 (...)

   Q2 2013  Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
0        0        0        0        0        0        0        0        0  
1        0        0        0        0        0        0        0        0  
2        0        0        0        0        0        0        0        0  
3        0        0        0        0        0        0        0        0  
4        0        0        0        0        0        0        0        0  
5        0        0        0        0        0        0        0        0  
6        0        0        0        0        0        0        0        0  
7        0        0        0        0        0        0        0        0  
8        0        0        0        0        0        0        0        0  
9        0        0        0        0        0        0        0        0

和其中一个文件的df_ss:

    Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  \
5                       Guam    11257    17155    23063    29150    37098   
10                  Kiribati      323      342      361      380      398   
15          Marshall Islands      425      428      433      440      449   
17                Micronesia        0        0        0        0        0   
19                     Nauru        0        0        0        0        0   
22  Northern Mariana Islands     2560     3386     4499     6000     8037   
27                     Palau     1513     1672     1828     1980     2130   

(...) 

    Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
5    150028   151152   152244   153283   154310   155333   156341  
10    19933    20315    20678    21010    21329    21637    21932  
15    17536    19160    20827    22508    24253    26057    27904  
17    18646    17939    17513    17232    17150    17233    17438  
19     7894     8061     8227     8388     8550     8712     8874  
22    27915    28198    28481    28753    29028    29304    29578  
27    17602    17858    18105    18337    18564    18785    19001  

2 个答案:

答案 0 :(得分:0)

尝试设置如下所示的值(请参阅此post):

df.ix[quaters,...] = 10

答案 1 :(得分:0)

由@joris:

你可以试试吗?     df.loc[row_idx, quarters] = df_h.values 对于最后一行(注意最后的额外.values)?

这个有效,谢谢: - )