热图的pandas数据透视表

时间:2017-04-06 12:24:39

标签: python pandas seaborn

我正在尝试使用seaborn生成热图,但是我的数据格式化问题很小。

目前,我的数据格式为:

Name     Diag   Date
A        1       2006-12-01
A        1       1994-02-12
A        2       2001-07-23
B        2       1999-09-12
B        1       2016-10-12
C        3       2010-01-20
C        2       1998-08-20

我想创建一个热图(最好是在python中),在Name的一个轴上显示Diag - 如果发生的话。我试图使用pd.pivot来转动表,但是我得到了错误

  

ValueError:索引包含重复的条目,无法重塑

这来自:

piv = df.pivot_table(index ='Name',columns ='Diag')

时间无关紧要,但我希望显示Names哪个Diag以及Diag个组合在一起的Name组合。我是否需要为此创建一个新表,或者我有可能吗?在某些情况下,Diag与所有Error starting ApplicationContext. To display the auto-configuration report re-run your application with 'debug' enabled. 06/04/2017 14:11:24.732 ERROR [main] - org.springframework.boot.SpringApplication: Application startup failed org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jpaMappingContext': Invocation of init method failed; nested exception is java.lang.IllegalArgumentException: At least one JPA metamodel must be present! at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1628) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:742) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542) at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122) at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:737) at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:370) at org.springframework.boot.SpringApplication.run(SpringApplication.java:314) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1162) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1151) at com.cadit.web.WebApplicationAware.main(WebApplicationAware.java:19) Caused by: java.lang.IllegalArgumentException: At least one JPA metamodel must be present! at org.springframework.util.Assert.notEmpty(Assert.java:277) at org.springframework.data.jpa.mapping.JpaMetamodelMappingContext.<init>(JpaMetamodelMappingContext.java:52) at org.springframework.data.jpa.repository.config.JpaMetamodelMappingContextFactoryBean.createInstance(JpaMetamodelMappingContextFactoryBean.java:71) at org.springframework.data.jpa.repository.config.JpaMetamodelMappingContextFactoryBean.createInstance(JpaMetamodelMappingContextFactoryBean.java:26) at org.springframework.beans.factory.config.AbstractFactoryBean.afterPropertiesSet(AbstractFactoryBean.java:134) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1687) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1624) ... 16 common frames omitted

无关

编辑: 我从那以后尝试过: piv = df.pivot_table(index ='Name',columns ='Diag',values ='Time',aggfunc ='mean')

然而,由于时间是日期时间格式,我最终得到:
pandas.core.base.DataError:没有要聚合的数字类型

1 个答案:

答案 0 :(得分:4)

您需要pivot_table一些聚合函数,因为相同的索引和列有多个值而pivot只需要唯一值:

print (df)
  Name  Diag  Time
0    A     1    12 <-duplicates for same A, 1 different value
1    A     1    13 <-duplicates for same A, 1 different value
2    A     2    14
3    B     2    18
4    B     1     1
5    C     3     9
6    C     2     8

df = df.pivot_table(index='Name',columns='Diag', values='Time', aggfunc='mean')
print (df)
Diag     1     2    3
Name                 
A     12.5  14.0  NaN
B      1.0  18.0  NaN
C      NaN   8.0  9.0

替代解决方案:

df = df.groupby(['Name','Diag'])['Time'].mean().unstack()
print (df)
Diag     1     2    3
Name                 
A     12.5  14.0  NaN
B      1.0  18.0  NaN
C      NaN   8.0  9.0

编辑:

您还可以按duplicated检查所有重复项:

df = df.loc[df.duplicated(['Name','Diag'], keep=False), ['Name','Diag']]
print (df)
  Name  Diag
0    A     1
1    A     1

编辑:

mean日期时间并不容易 - 需要将日期转换为nanoseconds,获取均值并最后转换为日期时间。还有另一个问题 - 需要将NaN替换为某个标量,例如0转化为0日期时间 - 1970-01-01的内容。

df.Date = pd.to_datetime(df.Date)
df['dates_in_ns'] = pd.Series(df.Date.values.astype(np.int64), index=df.index)
df = df.pivot_table(index='Name',
                    columns='Diag', 
                    values='dates_in_ns', 
                    aggfunc='mean', 
                    fill_value=0)
df = df.apply(pd.to_datetime)
print (df)
Diag                   1          2          3
Name                                          
A    2000-07-07 12:00:00 2001-07-23 1970-01-01
B    2016-10-12 00:00:00 1999-09-12 1970-01-01
C    1970-01-01 00:00:00 1998-08-20 2010-01-20