我有一个数据框:
Energy Supply Energy Supply per Capita % Renewable
Country
Afghanistan 3.210000e+08 10 78.669280
Albania 1.020000e+08 35 100.000000
British Virgin Islands 2.000000e+06 85 0.000000
...
Aruba 1.200000e+07 120 14.870690 ...
Excludes the overseas territories. NaN NaN NaN
Data exclude Hong Kong and Macao Special Admini... NaN NaN NaN
Data on kerosene-type jet fuel include aviation... NaN NaN NaN
For confidentiality reasons, data on coal and c... NaN NaN NaN
Data exclude Greenland and the Danish Faroes. NaN NaN NaN
我曾使用df = pd.read_excel(filelink, skiprows=16)
在文件的最开头删除不需要的信息但是如何摆脱df末尾的“noize”信息呢?
我曾尝试将一个列表传递给skiprows,但它搞砸了结果。
答案 0 :(得分:3)
您似乎需要read_excel
中的参数skip_footer = 5
:
skip_footer :int,默认为0
最后的行跳过(0索引)
样品:
df = pd.read_excel('myfile.xlsx', skip_footer = 5)
print (df)
Country Energy Supply Energy Supply per Capita \
0 Afghanistan 321000000.0 10
1 Albania 102000000.0 35
2 British Virgin Islands 2000000.0 85
3 Aruba 12000000.0 120
% Renewable
0 78.66928
1 100.00000
2 0.00000
3 14.87069
另一种解决方案是使用dropna
删除某些列中所有NaN
的所有行:
df = pd.read_excel('myfile.xlsx')
cols = ['Energy Supply','Energy Supply per Capita','% Renewable']
df = df.dropna(subset=cols, how='all')
print (df)
Country Energy Supply Energy Supply per Capita \
0 Afghanistan 321000000.0 10.0
1 Albania 102000000.0 35.0
2 British Virgin Islands 2000000.0 85.0
3 Aruba 12000000.0 120.0
% Renewable
0 78.66928
1 100.00000
2 0.00000
3 14.87069