两个相当大的数据帧df1
和df2
的列很多都带有浮点值。 对于df1
和df2
中相同的索引和列名称,分别通过以下操作创建两个新的数据框simple_ret_df
和log_ret_df
:
(1 - df1 / df2)
。可以说这个新数据框为simple_ret_df
ln(df1 / df2)
。可以说这个新数据框为log_ret_df
如果数据帧df1
或df2
中的任何条目丢失\ nan
\ 0
,则计算数据帧中的对应条目应为nan
。可以如下所示生成示例数据帧df1
和df2
import numpy as np
import pandas as pd
df1 = pd.DataFrame(10*(2+np.random.randn(500, 3)), columns=list('ABC'))
df2 = pd.DataFrame(10*(2+np.random.randn(500, 3)), columns=list('CDA'))
df1.drop(df1.index[[1, 4, 284, 354, 498]], inplace=True)
df2.drop(df2.index[[0, 98, 159]], inplace=True)
df1.loc[2, 'B'] = np.nan
df1.loc[5, 'C'] = np.nan
df1.loc[3, 'A'] = np.nan
df2.loc[5, 'C'] = np.nan
df2.loc[1, 'D'] = np.nan
df2.loc[2, 'A'] = np.nan
输入数据帧df1
和df2
的示例如下:
df1.head()
A B C
0 14.0 31.3 35.5
2 24.2 NaN 27.6
3 NaN 13.1 16.0
5 28.2 8.8 NaN
6 17.7 18.0 7.9
df2.head()
C D A
1 15.1 NaN 27.0
2 20.9 29.4 NaN
3 27.8 29.7 22.9
4 19.0 13.5 21.0
5 NaN 21.4 12.0
相应的样本输出数据帧simple_ret_df
和log_ret_df
如下:
simple_ret_df.head(6)
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN -0.3206 NaN
3 NaN NaN 0.4245 NaN
4 NaN NaN NaN NaN
5 -0.4750 NaN NaN NaN
log_ret_df.head(6)
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN 0.2781 NaN
3 NaN NaN -0.5524 NaN
4 NaN NaN NaN NaN
5 0.3887 NaN NaN NaN
答案 0 :(得分:0)
我有点在评论中回答我的问题。这是为您提供的解决方案。我是在Python3上制作的,而您的标签是python 2,因此您可能需要更改一些代码。
这是从头开始的全部代码。
import numpy as np
import pandas as pd
df1 = pd.DataFrame(10*(2+np.random.randn(500, 3)), columns=list('ABC'))
df2 = pd.DataFrame(10*(2+np.random.randn(500, 3)), columns=list('CDA'))
df1.drop(df1.index[[1, 4, 284, 354, 498]], inplace=True)
df2.drop(df2.index[[0, 98, 159]], inplace=True)
df1.loc[2, 'B'] = np.nan
df1.loc[5, 'C'] = np.nan
df1.loc[3, 'A'] = np.nan
df2.loc[5, 'C'] = np.nan
df2.loc[1, 'D'] = np.nan
df2.loc[2, 'A'] = np.nan
df1.head(10)
A B C
0 20.438695 18.114421 20.445370
2 12.789906 NaN 3.988319
3 NaN 11.026463 9.421921
5 19.919580 7.462012 NaN
6 21.290647 23.952295 -10.354758
7 8.447708 14.710224 25.499204
8 16.603850 24.862611 20.354342
9 6.088232 15.066117 22.906491
10 19.621493 15.877428 15.149765
11 2.052592 9.031476 19.531663
df2.head(10)
C D A
1 15.159127 NaN 29.432163
2 23.449304 8.393440 NaN
3 6.057011 32.881258 21.033391
4 31.162671 4.128745 23.264304
5 NaN 32.796018 11.171984
6 32.019817 14.603303 33.106655
7 17.566806 26.804403 12.421421
8 30.121336 46.520462 41.934098
9 13.498463 30.170049 24.221281
10 19.554489 28.238385 26.284620
# Merge the dataframes
merged_df = df1.merge(df2, right_index = True, left_index = True, how = 'outer')
# Create dataframes for each formula
final_df1 = pd.DataFrame(index = merged_df.index) # for this (1 - df1 / df2)
final_df2 = pd.DataFrame(index = merged_df.index) # for this ln(df1 / df2)
for column in merged_df.columns:
# Get initial letter from the column names
i = column[0]
# filter only the columns that start with the same letter
df_test = merged_df[merged_df.columns[merged_df.columns.str.startswith(i)]]
#If only one column, add that to the dataframes
if df_test.shape[1] == 1:
final_df1 = final_df1.merge(df_test, right_index = True, left_index = True, how = 'outer')
final_df2 = final_df2.merge(df_test, right_index = True, left_index = True, how = 'outer')
#If two columns do the calculations
else:
final_df1.loc[:,i] = np.where((df_test[df_test.columns[0]].isnull())|(df_test[df_test.columns[1]].isnull()),
np.nan, (1- df_test[df_test.columns[0]]) / df_test[df_test.columns[1]])
final_df2.loc[:,i] = np.where((df_test[df_test.columns[0]].isnull())|(df_test[df_test.columns[1]].isnull()),
np.nan, np.log(df_test[df_test.columns[0]] / df_test[df_test.columns[1]]))
#Adjust the names of the columns
final_df1.columns = final_df1.columns.str[0]
final_df2.columns = final_df2.columns.str[0]
print(final_df1.head(10), final_df2.head(10))
为此:
simple_ret_df.head(6)
A B C D
0 NaN 18.114421 NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN -0.127437 8.393440
3 NaN 11.026463 -1.390442 32.881258
4 NaN NaN NaN 4.128745
5 -1.693484 7.462012 NaN 32.796018
log_ret_df.head(6)
A B C D
0 NaN 18.114421 NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN -1.771471 8.393440
3 NaN 11.026463 0.441823 32.881258
4 NaN NaN NaN 4.128745
5 0.578294 7.462012 NaN 32.796018
答案 1 :(得分:0)
simple_ret_df = df1.combine(df2, lambda s1, s2: 1-s1/s2)
log_ret_df = df1.combine(df2, lambda s1, s2: np.log(s1/s2))