动态跳过python pandas中的顶级空白行

时间:2017-10-31 15:42:07

标签: python excel pandas

我正在使用python中的pandas读取多张excel文件。 我有三个案例

  1. 某些工作表包含第1行的数据
  2. delimited table Example: 
    ;;
    ;;
    ;;
    Country;Company;Product
    US;ABC;XYZ
    US;ABD;XYY
    
    1. ,有些在表格之前有空行数,有些表格有摘要   我知道使用skip_blank我可以摆脱顶部的空白行,但顶部空白行的数量本质上不固定可能是3或4或8
    2. delimited table Example: 
      
      Product summary table for East region;;
      Date: 1st Sep, 2016;;
      ;;
      Country;Company;Product
      US;ABC;XYZ
      US;ABD;XYY
      
        第一栏中的
      1. 表我试图阅读所有这些表但不确定如何 - 有没有办法弄清楚从第3行摘要结束和第4行是我的表标题和第一列标题是'国家'
      2. import wx
        import wx.grid as gridlib
        
        
        class PanelOne(wx.Panel):
            """"""    
        
            def __init__(self, parent):
                """Constructor"""
                wx.Panel.__init__(self, parent=parent)
                txt = wx.TextCtrl(self)
        
        
        class PanelTwo(wx.Panel):
            """"""    
        
            def __init__(self, parent):
                """Constructor"""
                wx.Panel.__init__(self, parent=parent)
        
                grid = gridlib.Grid(self)
                grid.CreateGrid(25,12)
        
                sizer = wx.BoxSizer(wx.VERTICAL)
                sizer.Add(grid, 0, wx.EXPAND)
                self.SetSizer(sizer)
        
        
        class MyForm(wx.Frame):    
        
            def __init__(self):
                wx.Frame.__init__(self, None, wx.ID_ANY, 
                                  "Panel Switcher Tutorial")
        
                self.panel_one = PanelOne(self)
                self.panel_two = PanelTwo(self)
                self.panel_two.Hide()
        
                self.sizer = wx.BoxSizer(wx.VERTICAL)
                self.sizer.Add(self.panel_one, 1, wx.EXPAND)
                self.sizer.Add(self.panel_two, 1, wx.EXPAND)
                self.SetSizer(self.sizer)
        
        
                menubar = wx.MenuBar()
                fileMenu = wx.Menu()
                switch_panels_menu_item = fileMenu.Append(wx.ID_ANY, 
                                                          "Switch Panels", 
                                                          "Some text")
                self.Bind(wx.EVT_MENU, self.onSwitchPanels, 
                          switch_panels_menu_item)
                menubar.Append(fileMenu, '&File')
                self.SetMenuBar(menubar)    
        
            def onSwitchPanels(self, event):
                """"""
                if self.panel_one.IsShown():
                    self.SetTitle("Panel Two Showing")
                    self.panel_one.Hide()
                    self.panel_two.Show()
                else:
                    self.SetTitle("Panel One Showing")
                    self.panel_one.Show()
                    self.panel_two.Hide()
                self.Layout()
        
        
        # Run the program
        if __name__ == "__main__":
            app = wx.App(False)
            frame = MyForm()
            frame.Show()
            app.MainLoop()
        

1 个答案:

答案 0 :(得分:1)

我建议使用以下算法:

  1. 阅读整个表格
  2. 考虑第一行不包含缺失值作为标题
  3. 删除标题
  4. 上方的所有行

    这段代码对我来说没问题:

    import pandas as pd
    for sheet in range(3):
        raw_data = pd.read_excel('blank_rows.xlsx', sheetname=sheet, header=None)
        print(raw_data)
        # looking for the header row
        for i, row in raw_data.iterrows():
            if row.notnull().all():
                data = raw_data.iloc[(i+1):].reset_index(drop=True)
                data.columns = list(raw_data.iloc[i])
                break
        # transforming columns to numeric where possible
        for c in data.columns:
            data[c] = pd.to_numeric(data[c], errors='ignore')
        print(data)
    

    根据您的示例使用this toy data sample。来自原始数据帧

             0        1        2
    0  Country  Company  Product
    1       US      ABC      XYZ
    2       US      ABD      XYY
    
             0        1        2
    0      NaN      NaN      NaN
    1      NaN      NaN      NaN
    2      NaN      NaN      NaN
    3  Country  Company  Product
    4       US      ABC      XYZ
    5       US      ABD      XYY
    
                                           0        1        2
    0  Product summary table for East region      NaN      NaN
    1                    Date: 1st Sep, 2016      NaN      NaN
    2                                    NaN      NaN      NaN
    3                                Country  Company  Product
    4                                     US      ABC      XYZ
    5                                     US      ABD      XYY
    

    脚本生成相同的表

      Country Company Product
    0      US     ABC     XYZ
    1      US     ABD     XYY