ValueError:对象类型<class'pandas.core.frame.dataframe'=“”>

时间:2018-01-09 13:24:32

标签: pandas pandas-groupby

我有一个数据框,df

        Date        inp   name   
    0  2017-08-07  2.3.6  ABC 
    1  2017-08-07  2.3.6  ABC      
    2  2017-08-08  2.3.6  TAC         
    3  2017-08-22  2.5.9  TTT         
    4  2017-09-23  0.8.0  TAC         
    5  2017-10-09  2.3.6  ABC         
    6  2017-10-09  2.3.6  TAC
    7  2017-10-09  2.3.6  TAC                  
    8  2017-10-23  0.8.0  TAC         
    9  2017-11-08  6.2.6  ABC         

然后我想通过按月分组来计算列中的出现次数:'name'和'inp'。数据帧df2应如下所示:

        Date       inp   name      count
      2017-08     2.3.6  ABC         2
      2017-08     2.3.6  TAC         1
      2017-08     2.5.9  TTT         1
      2017-09     0.8.0  TAC         1
      2017-10     2.3.6  ABC         1
      2017-10     2.3.6  TAC         2
      2017-10     0.8.0  TAC         1
      2017-11     6.2.6  ABC         1

然后,一个新的数据帧,df3如下:这是通过按月分组来按月计算出现次数(inp,name),并将日期索引更改为月份的单词,然后转动

   Index      2.3.6ABC  2.3.6TAC  2.5.9TTT  0.8.0TAC  6.2.6ABC         
   August      2          1        1         0           0
  September    0          0        0         1           0
  October      1          2        0         1           0
  November     0          0        0         0           1

但我有这样的代码:

df=pd.DataFrame(df, columns= ['Date','inp','name'])
df['Date']= pd.to_datetime(df['Date'], format= '"%m/%d/%Y %H:%M:%S 0"')
df = df.set_index(['Date'])
print(df)
df = df.loc['2017-08-01':'2017-11-30']

df2 = (df.groupby(df.index.date,'inp')['name']
     .value_counts()
     .rename_axis(('Date','inp','name'))
     .reset_index(name='count'))
print (df2)
#Sum the total number of  unique (name,inp) associated per month     
df2.Date= pd.to_datetime(df2.Date)
df3 = df2.groupby( [pd.Grouper(key='Date', freq='1M'),'inp','name']) ["count"].sum().unstack().fillna(0)
df3.index = df3.index.strftime('%B')
print(df3)

但我一直在接受:

ValueError: No axis named inp for object type <class 'pandas.core.frame.DataFrame'>

包含我要删除包含2个以上零的列。例如,像这样的新数据框,我该怎么做呢?

    Index      2.3.6ABC  2.3.6TAC       0.8.0TAC           
   August      2          1                 0           
  September    0          0                 1           
  October      1          2                 1           
  November     0          0                 0           

1 个答案:

答案 0 :(得分:1)

我认为您可以使用floor而不是df['Date'].dt.date使用[]更快的解决方案,groupby中的列表{/ 1}}:

df2 = (df.groupby([df['Date'].dt.floor('D'),'inp'])['name']
     .value_counts()
     .rename_axis(('Date','inp','name'))
     .reset_index(name='count'))
print (df2)
        Date    inp name  count
0 2017-08-07  2.3.6  ABC      2
1 2017-08-08  2.3.6  TAC      1
2 2017-08-22  2.5.9  TTT      1
3 2017-09-23  0.8.0  TAC      1
4 2017-10-09  2.3.6  TAC      2
5 2017-10-09  2.3.6  ABC      1
6 2017-10-23  0.8.0  TAC      1
7 2017-11-08  6.2.6  ABC      1

然后unstack按第二和第三级,并NaN0替换为.unstack(level=[1,2], fill_value=0)

df3 = (df2.groupby([pd.Grouper(key='Date', freq='1M'),'inp','name'])["count"]
          .sum()
          .unstack(level=[1,2], fill_value=0))
df3.columns = df3.columns.map(''.join)
df3.index = df3.index.strftime('%B')
print (df3)
           2.3.6ABC  2.3.6TAC  2.5.9TTT  0.8.0TAC  6.2.6ABC
August            2         1         1         0         0
September         0         0         0         1         0
October           1         2         0         1         0
November          0         0         0         0         1

最后,boolean indexing使用loc删除了列:

df4 = df3.loc[:, df3.eq(0).sum() <= 2]
#same as
#df4 = df3.loc[:, (df3 == 0).sum() <= 2]
print (df4)
           2.3.6ABC  2.3.6TAC  0.8.0TAC
August            2         1         0
September         0         0         1
October           1         2         1
November          0         0         0