需要对每列将返回2列col_sin和col_cos的列进行sin和cos转换
def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
return pd.Series([sin_, cos_], index=['sin', 'cos']
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
df = df.apply(transform, axis=0, var=0)
返回(数字不正确,因为cols与实际传递的cols不同)
+-----+----------------------------+----------------------------+
| | col1 | col2 |
|-----+----------------------------+----------------------------|
| sin | 0 0.000000e+00 | 0 0.000000e+00 |
| | 1 1.000000e+00 | 1 -1.133108e-15 |
| | 2 5.665539e-16 | 2 -7.347881e-16 |
| | 3 -1.000000e+00 | 3 -4.532431e-15 |
| | 4 -1.133108e-15 | 4 -1.224647e-15 |
| | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
| cos | 0 1.000000e+00 | 0 1.0 |
| | 1 2.832769e-16 | 1 1.0 |
| | 2 -1.000000e+00 | 2 1.0 |
| | 3 -1.836970e-16 | 3 1.0 |
| | 4 1.000000e+00 | 4 1.0 |
| | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
+-----+----------------------------+----------------------------+
预期输出应包含4列:col1_sin,col1_cos,col2_sin和col2_cos
我该如何实现?
还有一种方法可以将var作为列表/元组传递,其中var [0]用于col1,而var [1]用于col2?像这样的东西:
df = df.apply(transform, axis=0, var=[0, 60])
有没有办法用raw = True来加快速度?像这样的东西不起作用
def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
return np.column_stack((sin_, cos_))
谢谢!
答案 0 :(得分:2)
这里不需要apply
。您应该传递整个DataFrame。我们可以concat
和add_suffix
来获得正确的名称。使用np.broadcast_to
,我们可以处理单个偏移量或形状正确的列表/数组:
import pandas as pd
import numpy as np
def transform(data, var, degrees=True):
"""
data : pd.DataFrame
var : numeric, or list/array of numerics. Should be
broadcastable to data.shape
"""
data = data - np.broadcast_to(var, data.shape)
# data = data - var # also works for compatible shapes
if degrees:
data = np.radians(data)
return pd.concat([np.sin(data).add_suffix('_sin'),
np.cos(data).add_suffix('_cos')],
axis=1)
transform(df, var=[45, 0], degrees=True)
col1_sin col2_sin col1_cos col2_cos
0 -0.707107 0.000000e+00 0.707107 1.0
1 -0.500000 8.660254e-01 0.866025 0.5
2 -0.258819 1.224647e-16 0.965926 -1.0
3 0.000000 -8.660254e-01 1.000000 -0.5
4 0.258819 -8.660254e-01 0.965926 0.5
答案 1 :(得分:2)
使用DataFrame.pipe
传递所有DataFrame
,同样如果var
是具有相同大小的列表(例如可能的列数),则将它们相减,将DataFrames连接在一起,并返回具有新列名的DataFrame:
def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
arr = np.column_stack((sin_, cos_))
c = (data.columns + '_sin').tolist() + (data.columns + '_cos').tolist()
return pd.DataFrame(arr, index=df.index, columns=c)
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
df = df.pipe(transform, var=[0, 60])
print (df)
col1_sin col2_sin col1_cos col2_cos
0 0.000000 0.304811 1.000000 -0.952413
1 0.650288 0.000000 -0.759688 1.000000
2 -0.988032 0.580611 0.154251 0.814181
3 0.850904 -0.801153 0.525322 -0.598460
4 -0.304811 0.945445 -0.952413 0.325781
答案 2 :(得分:1)
可以通过使用沿列名称的简单for循环并添加sin / cos列来获得结果。我测试了一百万列,它在不到一秒钟的时间内完成。
df = pd.DataFrame(np.random.uniform(low=0, high=3.14,size=(1000000, 2)), columns=['column1','column2'])
var = [0, .5]
for idx, column in enumerate(df.columns):
df[column + '_sin'] = np.sin(df[column] - var[idx])
df[column + '_cos'] = np.cos(df[column] - var[idx])
df.head()
它为您提供如下输出
column1 column2 column1_sin column1_cos column2_sin column2_cos
0 1.977094 0.705613 0.918590 -0.395211 0.648500 0.761214
1 2.138289 2.246560 0.843252 -0.537519 0.780229 -0.625493
2 2.947415 1.716964 0.192960 -0.981207 0.989336 -0.145648
3 1.738969 0.748142 0.985892 -0.167381 0.680278 0.732954
4 1.136741 1.190389 0.907268 0.420554 0.928513 0.371299
更改axis=1
并返回pd.Series。
示例代码是
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
def transform(data, var):
return np.sin(data-var).add_suffix('_sin').append(np.cos(data-var).add_suffix('_cos'))
df.apply(transform, axis=1, var=[10,20])
为您提供输出
col1_sin col2_sin col1_cos col2_cos
0 0.544021 -0.912945 -0.839072 0.408082
1 -0.958924 0.745113 0.283662 -0.666938
2 0.912945 0.219425 0.408082 -0.975629
3 -0.428183 0.088399 -0.903692 0.996085
4 -0.262375 -0.387809 0.964966 -0.921740