熊猫根据条件重复行并取消堆叠

时间:2018-08-07 09:32:00

标签: python pandas dataframe duplicates row

我有以下格式的熊猫数据框:

  Name    Age    BMoney    BTime    BEffort

  John    22       1         0        0
  Pete    54       0         1        0
  Lisa    26       0         1        1

我想将其转换为

  Name    Age    B

  John    22     Money
  Pete    54     Time
  Lisa    26     Effort
  Lisa    26     Time

也就是说,基于“ Breason”列中的值,我想创建一个包含“ reason”的新列“ B”。如果由于某个人而存在多种原因(即:一行包含多个1),那么我想在新数据框中为该人创建单独的行,以说明他们的不同原因。

1 个答案:

答案 0 :(得分:3)

具有多重索引和stack():

# Create the dataframe
df = [["John",    22,       1,         0,        0],
      ["Pete",   54,       0,         1,        0],
      ["Lisa",    26,       1,         1,        0]]
df = pd.DataFrame(df, columns=["Name", "Age", "BMoney", "BTime", "BEffort"])

# Set Multi Indexing
df.set_index(["Name", "Age"], inplace=True)

enter image description here

# Use the fact that columns and Series can carry names and use stack to do the transformation
df.columns.name = "B"
df = df.stack()
df.name = "value"
df = df.reset_index()

enter image description here

# Select only the "valid" rows, remove the last columns and remove first letter in B columns
df = df[df.value == 1]
df.drop("value", axis=1, inplace=True)
df["B"] = df.B.apply(lambda x: x[1:])

enter image description here