pd.DataFrame中的Nan(模拟矩阵)

时间:2019-03-25 16:42:24

标签: python pandas

我有一个像这样的数据框。我想删除Nans并向上移动单元格。然后添加一个日期列并将其设置为索引。

<Dropzone multiple={true} onDrop={this.onDrop}>
  {({ getRootProps, getInputProps }) => (
    <StyledDropzone>
      <DropPoint {...getRootProps()}>
        <input {...getInputProps()} />
        <p>Drag and drop some images to upload</p>
      </DropPoint>
      <FilePreviewer>
        <Thumbs>{thumbs}</Thumbs>
      </FilePreviewer>
    </StyledDropzone>
  )}
</Dropzone>

输出应为:

                ciao      google    microsoft
Search Volume   368000    NaN       NaN
Search Volume   368000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000

看起来很简单,但我不知道该怎么做。谢谢

5 个答案:

答案 0 :(得分:0)

这应该有效:

denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}

df_out = pd.DataFrame(denulled, index=date)

答案 1 :(得分:0)

您也可以在列上使用dropna作为序列

df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates

答案 2 :(得分:0)

我的主张是:

pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
    index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])

要点是对每列执行的字典理解

dropna 删除 NaN 项,而 values 可以使自己摆脱困境 索引值。

答案 3 :(得分:0)

一个麻烦的解决方案是由于您的索引重复

pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]: 
                  ciao      google  microsoft
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0

答案 4 :(得分:0)

您可以在dropna中使用apply:

df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)

输出:

     ciao      google   microsoft  date     
 368000.0  37200000.0   135000.0   20140115 
 368000.0  37200000.0   135000.0   20140215 
 450000.0  37200000.0   110000.0   20140315 
 450000.0  37200000.0   110000.0   20140415 
 450000.0  37200000.0   110000.0   20140515 
 450000.0  37200000.0   110000.0   20140615