我正在尝试使用熊猫读取Excel文件。我有兴趣仅从excel文件中读取相关数据,即删除包含“ nan”值的行/列。我遇到了数据框第一行包含“未命名”值的问题。 我的标头从哪一行开始永远是不固定的,因此我避免使用跳过行和标头。
在使用下面提到的命令时,由于将Unnamed作为标头,它从数据框中删除了几乎所有数据。
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
我已使用以下命令清除数据:
data = pd.read_excel("text.xlsx", sheet_name=1,index=False)
print(data)
BINS 2018-RUI: Red Roof Inn Portfolio Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 5
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 No. Property \nID Property Name Street Address
4 1.001 10228 Red Roof Plus 777 Airport Boulevard
5 1.002 10150 Red Roof Plus1 15 Meadowlands Parkway
6 1.003 10304 Red Roof Inn Boulevard Seattle
data1 = data.dropna(axis = 0, how = 'all', thresh=3)
data2 = data1.dropna(axis = 1, how = 'all')
print(data2)
BINS 2018-RUI: Red Roof Inn Portfolio Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 5
3 No. Property \nID Property Name Street Address
4 1.001 10228 Red Roof Plus 777 Airport Boulevard
5 1.002 10150 Red Roof Plus1 15 Meadowlands Parkway
6 1.003 10304 Red Roof Inn Boulevard Seattle
预期输出:
3 No. Property \nID Property Name Street Address
4 1.001 10228 Red Roof Plus 777 Airport Boulevard
5 1.002 10150 Red Roof Plus1 15 Meadowlands Parkway
6 1.003 10304 Red Roof Inn Boulevard Seattle
我不希望在单元格上写有未命名的第一行。 (这是一小部分数据,实际数据有100行和100列)
答案 0 :(得分:0)
鉴于您不知道要跳过多少行,可以像这样删除所有NA值。
缺少的步骤是将第一行(notna)设置为标题:
data.columns = data.iloc[0]
,然后从数据集中删除该行:
data = data.iloc[1:,].reindex()