大熊猫应用于数据框列,以返回带有后缀的多个列

时间:2019-12-05 12:08:18

标签: python pandas dataframe

需要对每列将返回2列col_sin和col_cos的列进行sin和cos转换

def transform(data, var):
    sin_ = np.sin(data - var)
    cos_ = np.cos(data - var)
    return pd.Series([sin_, cos_], index=['sin', 'cos']
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
df = df.apply(transform, axis=0, var=0)

返回(数字不正确,因为cols与实际传递的cols不同)

+-----+----------------------------+----------------------------+
|     | col1                       | col2                       |
|-----+----------------------------+----------------------------|
| sin | 0    0.000000e+00          | 0    0.000000e+00          |
|     | 1    1.000000e+00          | 1   -1.133108e-15          |
|     | 2    5.665539e-16          | 2   -7.347881e-16          |
|     | 3   -1.000000e+00          | 3   -4.532431e-15          |
|     | 4   -1.133108e-15          | 4   -1.224647e-15          |
|     | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
| cos | 0    1.000000e+00          | 0    1.0                   |
|     | 1    2.832769e-16          | 1    1.0                   |
|     | 2   -1.000000e+00          | 2    1.0                   |
|     | 3   -1.836970e-16          | 3    1.0                   |
|     | 4    1.000000e+00          | 4    1.0                   |
|     | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
+-----+----------------------------+----------------------------+

预期输出应包含4列:col1_sin,col1_cos,col2_sin和col2_cos

我该如何实现?

还有一种方法可以将var作为列表/元组传递,其中var [0]用于col1,而var [1]用于col2?像这样的东西:

df = df.apply(transform, axis=0, var=[0, 60])

有没有办法用raw = True来加快速度?像这样的东西不起作用

def transform(data, var):
    sin_ = np.sin(data - var)
    cos_ = np.cos(data - var)
    return np.column_stack((sin_, cos_))

谢谢!

3 个答案:

答案 0 :(得分:2)

这里不需要apply。您应该传递整个DataFrame。我们可以concatadd_suffix来获得正确的名称。使用np.broadcast_to,我们可以处理单个偏移量或形状正确的列表/数组:

import pandas as pd
import numpy as np

def transform(data, var, degrees=True):
    """
    data : pd.DataFrame
    var : numeric, or list/array of numerics. Should be 
          broadcastable to data.shape
    """
    data = data - np.broadcast_to(var, data.shape)
    # data = data - var # also works for compatible shapes         

    if degrees:
        data = np.radians(data)

    return pd.concat([np.sin(data).add_suffix('_sin'),
                      np.cos(data).add_suffix('_cos')],
                     axis=1)

transform(df, var=[45, 0], degrees=True)
   col1_sin      col2_sin  col1_cos  col2_cos
0 -0.707107  0.000000e+00  0.707107       1.0
1 -0.500000  8.660254e-01  0.866025       0.5
2 -0.258819  1.224647e-16  0.965926      -1.0
3  0.000000 -8.660254e-01  1.000000      -0.5
4  0.258819 -8.660254e-01  0.965926       0.5

答案 1 :(得分:2)

使用DataFrame.pipe传递所有DataFrame,同样如果var是具有相同大小的列表(例如可能的列数),则将它们相减,将DataFrames连接在一起,并返回具有新列名的DataFrame:

def transform(data, var):
    sin_ = np.sin(data - var)
    cos_ = np.cos(data - var)
    arr =  np.column_stack((sin_, cos_))
    c = (data.columns + '_sin').tolist() + (data.columns + '_cos').tolist()
    return pd.DataFrame(arr, index=df.index, columns=c)

d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)

df = df.pipe(transform, var=[0, 60])
print (df)
   col1_sin  col2_sin  col1_cos  col2_cos
0  0.000000  0.304811  1.000000 -0.952413
1  0.650288  0.000000 -0.759688  1.000000
2 -0.988032  0.580611  0.154251  0.814181
3  0.850904 -0.801153  0.525322 -0.598460
4 -0.304811  0.945445 -0.952413  0.325781

答案 2 :(得分:1)

简单的循环

可以通过使用沿列名称的简单for循环并添加sin / cos列来获得结果。我测试了一百万列,它在不到一秒钟的时间内完成。

df = pd.DataFrame(np.random.uniform(low=0, high=3.14,size=(1000000, 2)), columns=['column1','column2'])
var = [0, .5]
for idx, column in enumerate(df.columns):
    df[column + '_sin'] = np.sin(df[column] - var[idx])
    df[column + '_cos'] = np.cos(df[column] - var[idx])
df.head()

它为您提供如下输出

    column1     column2     column1_sin     column1_cos     column2_sin     column2_cos
0   1.977094    0.705613    0.918590    -0.395211   0.648500    0.761214
1   2.138289    2.246560    0.843252    -0.537519   0.780229    -0.625493
2   2.947415    1.716964    0.192960    -0.981207   0.989336    -0.145648
3   1.738969    0.748142    0.985892    -0.167381   0.680278    0.732954
4   1.136741    1.190389    0.907268    0.420554    0.928513    0.371299

另一个选项

更改axis=1并返回pd.Series。 示例代码是

d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
def transform(data, var):
    return np.sin(data-var).add_suffix('_sin').append(np.cos(data-var).add_suffix('_cos'))

df.apply(transform, axis=1, var=[10,20])

为您提供输出

    col1_sin    col2_sin    col1_cos    col2_cos
0   0.544021    -0.912945   -0.839072   0.408082
1   -0.958924   0.745113    0.283662    -0.666938
2   0.912945    0.219425    0.408082    -0.975629
3   -0.428183   0.088399    -0.903692   0.996085
4   -0.262375   -0.387809   0.964966    -0.921740