计算具有其他字符串类型列的数据框的熊猫熊猫分组更改

时间:2020-07-24 16:23:15

标签: python pandas dataframe group-by pandas-groupby

我遇到了这个问题,它与我要尝试的事情非常相似: python pandas groupby calculate change

唯一的问题是我的数据框要复杂得多,因为它还有一堆我还想计算差异的值列,还有一些我需要保留的字符串类型的列,但是我显然可以不能计算出它们的数值差。

Group |   Date      | Value | Leader |  Quantity
  A     01-02-2016     16.0      John        1
  A     01-03-2016     15.0      John        1
  B     01-02-2016     16.0     Phillip      1
  B     01-03-2016     13.0     Phillip      1 
  C     01-02-2016     16.0       Bob        1
  C     01-03-2016     16.0       Bob        1 

是否有一种方法可以更改代码,以便使差异仅适用于浮点型值,而不必通过使用loc / iloc来指定哪些列是浮点型的?所以我会得到这样的东西:

    Date    Group      Change in Value    Leader    Change in Quantity
2016-01-02    A             NaN            John            NaN
2016-01-03    A       -0.062500            John             0
2016-01-02    B             NaN           Phillip          NaN
2016-01-03    B       -0.187500           Phillip           0 
2016-01-02    C             NaN             Bob            NaN
2016-01-03    C        0.000000             Bob             0

此外,是否可以将pct_change更改为diff?所以理想情况下,我会得到这样的东西:

    Date    Group   Leader    Change in Value    Change in Quantity
2016-01-02    A      John          NaN                    NaN
2016-01-03    A      John         -1.0                    0.0
2016-01-02    B    Phillip         NaN                    NaN
2016-01-03    B    Phillip        -3.0                    0.0
2016-01-02    C      Bob           NaN                    NaN
2016-01-03    C      Bob           0.0                    0.0

有关我的实际数据集的更多详细信息:

  • 每个组都有两行(仅考虑两个日期)
  • 理想情况下,我希望能够对行进行切片,以便删除所有具有NaN值的行
  • 为了一致性,我需要所有数值都显示为浮点数

谢谢!

2 个答案:

答案 0 :(得分:2)

使用select_dtypesjoin

df1 = df.select_dtypes('number')
df_final = df.drop(df1.columns, 1).join(df1.groupby(df['Group'])
                                           .pct_change().add_prefix('Change_in_'))

Out[10]:
  Group        Date   Leader  Change_in_Value  Change_in_Quantity
0     A  01-02-2016     John              NaN                 NaN
1     A  01-03-2016     John          -0.0625                 0.0
2     B  01-02-2016  Phillip              NaN                 NaN
3     B  01-03-2016  Phillip          -0.1875                 0.0
4     C  01-02-2016      Bob              NaN                 NaN
5     C  01-03-2016      Bob           0.0000                 0.0

使用diff。只需将pct_change替换为diff

df1 = df.select_dtypes('number')
df_final =  df.drop(df1.columns, 1).join(df1.groupby(df['Group'])
                                            .diff().add_prefix('Change_in_'))
    
Out[15]:
  Group        Date   Leader  Change_in_Value  Change_in_Quantity
0     A  01-02-2016     John             NaN                NaN
1     A  01-03-2016     John            -1.0                0.0
2     B  01-02-2016  Phillip             NaN                NaN
3     B  01-03-2016  Phillip            -3.0                0.0
4     C  01-02-2016      Bob             NaN                NaN
5     C  01-03-2016      Bob             0.0                0.0

答案 1 :(得分:0)

您可以这样做:

cols = []
for col in df3.columns:
    if str(col).startswith('Value'):
        cols.append(col)

for i in range(len(cols)-1):
    df["Change " + i] = (df["Value " + i] - df["Value " + i].shift(-1)) / df["Value " + i]