如何删除空数据行

时间:2019-08-17 14:55:07

标签: python pandas

我有类似这种格式的数据:

Date,Open,High,Low,Close,Adj Close,Volume
2019-07-31,0.44,0.4401,0.44,0.44,0.44,32900
2019-08-01,0.45,0.45,0.45,0.45,0.45,200
2019-08-02,0.44,0.44,0.43,0.44,0.44,13800
2019-08-08,0.45,0.4501,0.45,0.4501,0.4501,400
2019-08-15,0.43,0.43,0.43,0.43,0.43,300
2019-08-15,0.0,0.0,0.0,0.43,0.43,0

请注意,最后一行的数据为空。
如何过滤此行或删除此行?

df = None
for ticker in tickers:
    try:
        df = pd.read_csv('stock_data/daily/{}.csv'.format(ticker), parse_dates=True, index_col=0).dropna()
    except FileNotFoundError as e:
        continue    # continue with next ticker
    df_closes = df['Close']
    if len(df_closes) < 4:
        continue    # continue with next ticker
    df_closes = df_closes[pd.notnull(df['Close'])]   # delete rows with empty data
    df_closes = df_closes.reindex(index=df_closes.index[::-1]) # reversing

2 个答案:

答案 0 :(得分:0)

假设Volume列为零,我假设您认为一条记录为空,我们可以使用以下方法将其过滤掉:

df = df[df['Volume'] > 0]

或者我们可以检查OpenHighLow中的至少一个是否不同于零,并过滤掉这些列中只有零的行:

df = df[df[['Open', 'High', 'Low']].any(axis=1)]

您可以使用以下方法删除重复的索引:

df = df[~df.index.duplicated()]

我们可以向duplicated(..)添加一个参数,该参数指定要保留的参数。可能的值为'first''last'False(这意味着删除重复索引的所有项目)。默认值为'first'

您可以先合并删除没有值的记录,然后再删除具有重复索引的项目。我不会以相反的顺序执行此操作,因为那样的话,您可能会删除带有数据的记录,而保留没有数据的记录。

答案 1 :(得分:0)

使用以下代码删除行:

#Deleting the whole row if a specific column(multiple rows can be added into this list) has value zero
df[df['High'] != 0]
df[df['High'].ne(0)]

#If values in any of the column in a row has zero
df[(df != 0).all(1)]
df[~(df == 0).any(axis=1)]