如何在单元格中删除包含“未命名”的python中的行?

时间:2019-11-10 10:33:05

标签: python pandas dataframe

我正在尝试使用熊猫读取Excel文件。我有兴趣仅从excel文件中读取相关数据,即删除包含“ nan”值的行/列。我遇到了数据框第一行包含“未命名”值的问题。 我的标头从哪一行开始永远是不固定的,因此我避免使用跳过行和标头。

在使用下面提到的命令时,由于将Unnamed作为标头,它从数据框中删除了几乎所有数据。

df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

我已使用以下命令清除数据:

data = pd.read_excel("text.xlsx", sheet_name=1,index=False)
print(data)
BINS 2018-RUI: Red Roof Inn Portfolio   Unnamed: 1  Unnamed: 2  Unnamed: 3  Unnamed: 5
0   NaN       NaN       NaN     NaN     NaN     NaN
1   NaN       NaN       NaN     NaN     NaN     NaN
2   NaN       NaN       NaN     NaN     NaN     NaN
3   No. Property \nID   Property Name       Street Address  
4   1.001   10228        Red Roof Plus      777 Airport Boulevard       
5   1.002   10150        Red Roof Plus1     15 Meadowlands Parkway      
6   1.003   10304        Red Roof Inn       Boulevard Seattle
data1 = data.dropna(axis = 0, how = 'all', thresh=3)
data2 = data1.dropna(axis = 1, how = 'all')
print(data2)

BINS 2018-RUI: Red Roof Inn Portfolio   Unnamed: 1  Unnamed: 2  Unnamed: 3  Unnamed: 5
3   No. Property \nID   Property Name       Street Address  
4   1.001   10228        Red Roof Plus      777 Airport Boulevard       
5   1.002   10150        Red Roof Plus1     15 Meadowlands Parkway      
6   1.003   10304        Red Roof Inn       Boulevard Seattle   

预期输出:

3   No. Property \nID   Property Name       Street Address  
4   1.001   10228        Red Roof Plus      777 Airport Boulevard       
5   1.002   10150        Red Roof Plus1     15 Meadowlands Parkway      
6   1.003   10304        Red Roof Inn       Boulevard Seattle

我不希望在单元格上写有未命名的第一行。 (这是一小部分数据,实际数据有100行和100列)

1 个答案:

答案 0 :(得分:0)

鉴于您不知道要跳过多少行,可以像这样删除所有NA值。

缺少的步骤是将第一行(notna)设置为标题:

data.columns = data.iloc[0]

,然后从数据集中删除该行:

data = data.iloc[1:,].reindex()