Python熊猫groupby + join

时间:2019-05-09 15:16:47

标签: python pandas

import datetime

a = pd.DataFrame({'Entreprise': {0: 110, 1: 110, 2: 110, 3: 110, 4: 110},
 'Etablissement': {0: 'SVR RUN',
  1: 'SVR RUN',
  2: 'SVR RUN',
  3: 'SVR RUN',
  4: 'SVR RUN'},
 'Date_achat_as_date': {0: datetime.datetime(1996, 12, 15, 0, 0),
  1: datetime.datetime(1996, 12, 15, 0, 0),
  2: datetime.datetime(2001, 1, 17, 0, 0),
  3: datetime.datetime(2001, 1, 17, 0, 0),
  4: datetime.datetime(2011, 7, 1, 0, 0)},
 'Valeur_Brute': {0: 2820397.61,
  1: 1188910.0,
  2: 245029.17,
  3: 124118.68,
  4: 113382.0}})

gp_by =  ["Entreprise", "Etablissement", "Date_achat_as_date"]

gp = a.groupby(gp_by )["Valeur_Brute"].sum()

a.set_index(gp_by).join(gp, rsuffix="_sum_day")

enter image description here

但是当我访问数据库时,出现了一个错误:

gp = database.groupby(gp_by )["Valeur_Brute"].sum()
database.set_index(gp_by).join(gp, rsuffix="_sum_day")

TypeError: '<' not supported between instances of 'str' and 'int'

然后我想我在某处有一些空值,所以我做了

database[["Entreprise"]] = database[["Entreprise"]].fillna(1)
database[["Etablissement"]] = database[["Etablissement"]].fillna(1)
database = database[~database.Date_achat_as_date.isnull()]

然后:

gp = database.groupby(gp_by )["Valeur_Brute"].sum()
database.set_index(gp_by).join(gp, rsuffix="_sum_day")

但是我仍然收到错误消息:

TypeError: '<' not supported between instances of 'str' and 'int'

知道我在哪里缺少什么吗?

编辑 我用以下方法解决了这个问题:

database[gp_by] = database[gp_by].applymap(str)

但是仍然不明白为什么和如何:-/

0 个答案:

没有答案