简化情况:
我有一个包含某些国家/地区列表的文件,我将其加载到dataframe df。 然后,我在许多.xls文件中获得了有关这些国家(以及更多国家/地区)的数据。我尝试将每个文件读取到df_f,对我感兴趣的数据进行子集化,然后从原始文件中查找国家/地区,如果存在任何文件,则将数据复制到dataframe df。
问题是只有一些值被正确分配。其中大多数是以NaNs形式插入的。 (见下文)
for filename in os.listdir(os.getcwd()):
df_f = pd.read_excel(filename, sheetname = 'Data', parse_cols = "D,F,H,J:BS", skiprows = 2, skip_footer = 2)
df_f = df_f.fillna(0)
df_ss = [SUBSETTING df_f here]
countries = df_ss['Country']
for c in countries:
if (c in df['Country'].values):
row_idx = df[df['Country'] == c].index
df_h = df_ss[quarters][df_ss.Country == c]
df.loc[row_idx, quarters] = df_h
我得到的结果是:
Country Q1 2000 Q2 2000 Q3 2000 Q4 2000 Q1 2001 Q2 2001 Q3 2001 \
0 Albania NaN NaN NaN NaN NaN NaN NaN
1 Algeria NaN NaN NaN NaN NaN NaN NaN
2 Argentina NaN NaN NaN NaN NaN NaN NaN
3 Armenia NaN NaN NaN NaN NaN NaN NaN
4 Australia NaN NaN NaN NaN NaN NaN NaN
5 Austria 4547431 5155839 5558963 6079089 6326217 6483130 6547780
6 Azerbaijan NaN NaN NaN NaN NaN NaN NaN
etc...
加载和子集正确完成,数据没有损坏 - 我为每次迭代打印df_h并显示常规数字。关键是,在将它们分配给df数据帧后,它们将成为NaN ...
有什么想法吗?
编辑:样本数据
df:
Country Country group Population Development coefficient Q1 2000 \
0 Albania group II 2981000 -1 0
1 Algeria group I 39106000 -1 0
2 Argentina group III 42669000 -1 0
3 Armenia group II 3013000 -1 0
4 Australia group IV 23520000 -1 0
5 Austria group IV 8531000 -1 0
6 Azerbaijan group II 9538000 -1 0
7 Bangladesh group I 158513000 -1 0
8 Belarus group III 9470000 -1 0
9 Belgium group III 11200000 -1 0
(...)
Q2 2013 Q3 2013 Q4 2013 Q1 2014 Q2 2014 Q3 2014 Q4 2014 Q1 2015
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
和其中一个文件的df_ss:
Country Q1 2000 Q2 2000 Q3 2000 Q4 2000 Q1 2001 \
5 Guam 11257 17155 23063 29150 37098
10 Kiribati 323 342 361 380 398
15 Marshall Islands 425 428 433 440 449
17 Micronesia 0 0 0 0 0
19 Nauru 0 0 0 0 0
22 Northern Mariana Islands 2560 3386 4499 6000 8037
27 Palau 1513 1672 1828 1980 2130
(...)
Q3 2013 Q4 2013 Q1 2014 Q2 2014 Q3 2014 Q4 2014 Q1 2015
5 150028 151152 152244 153283 154310 155333 156341
10 19933 20315 20678 21010 21329 21637 21932
15 17536 19160 20827 22508 24253 26057 27904
17 18646 17939 17513 17232 17150 17233 17438
19 7894 8061 8227 8388 8550 8712 8874
22 27915 28198 28481 28753 29028 29304 29578
27 17602 17858 18105 18337 18564 18785 19001
答案 0 :(得分:0)
尝试设置如下所示的值(请参阅此post):
df.ix[quaters,...] = 10
答案 1 :(得分:0)
由@joris:
你可以试试吗?df.loc[row_idx, quarters] = df_h.values
对于最后一行(注意最后的额外.values
)?
这个有效,谢谢: - )