我试图将两个pandas data_frames与不同数量的列相乘,我想得到第一个data_frame形状的结果data_frame。即,如果行和列匹配两个单元格的多个,则保持该值与第一个数据帧中的值相同。以下示例: 什么是最有效的矩阵形式方法,不需要for循环?
谢谢!
DF1:
DF2:
得到的df(df_result = df1 * df2):
答案 0 :(得分:4)
选项1
使用pd.DataFrame.align
pd.DataFrame.mul(*df1.align(df2, 'left', fill_value=1))
X Y Z
1/1/2017 0.26 0.94 0.22
1/3/2017 NaN 0.63 0.78
1/5/2017 0.73 0.79 0.25
1/6/2017 0.13 NaN 0.31
选项2
使用pd.DataFrame.reindex
df1 * df2.reindex(df1.index, df1.columns, fill_value=1)
X Y Z
1/1/2017 0.26 0.94 0.22
1/3/2017 NaN 0.63 0.78
1/5/2017 0.73 0.79 0.25
1/6/2017 0.13 NaN 0.31
选项3
使用pd.DataFrame.mask
根据评论者@CedricZoppolo的建议:
警告:这是假设1
的值旨在标记有效位置,如掩码。这不会使这些值相乘。如果打算真正乘以值,则不要使用此选项。
df1.mask(df2.isnull().reindex_like(df1).fillna(False))
X Y Z
1/1/2017 0.26 0.94 0.22
1/3/2017 NaN 0.63 0.78
1/5/2017 0.73 0.79 0.25
1/6/2017 0.13 NaN 0.31
设置
from numpy import nan as NA
df1 = pd.DataFrame(dict(
X=[0.26, 0.45, 0.73, 0.13],
Y=[0.94, 0.63, 0.79, 0.16],
Z=[0.22, 0.78, 0.25, 0.31]
), ['1/1/2017', '1/3/2017', '1/5/2017', '1/6/2017'])
df2 = pd.DataFrame(dict(
X=[1, NA, NA, NA, 1, 1],
XX=[NA, NA, NA, 1, 1, 1],
Y=[1, 1, 1, 1, 1, NA],
Y1=[NA, NA, NA, 1, NA, NA],
YY=[NA, 1, NA, 1, NA, 1]
), ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'], dtype=object)
答案 1 :(得分:0)
没有单行解决方案,但以下情况应该有效,不涉及任何循环。
public.heic
答案 2 :(得分:0)
找到共享列,然后使用此方法对两个数据帧进行切片,然后乘以:
In [47]: df1
Out[47]:
X Y Z
0 1 2 3
1 4 5 6
2 7 8 9
In [48]: df2
Out[48]:
X XX Y Y1 YY
0 1 2 NaN 4 NaN
1 4 5 NaN 4 5
2 7 8 9 2 3
In [49]: shared_cols = [col for col in df1.columns if col in df2.columns]
In [50]: shared_cols
Out[50]: ['X', 'Y']
In [51]: df1[shared_cols] * df2[shared_cols]
Out[51]:
X Y
0 1 NaN
1 16 NaN
2 49 72
答案 3 :(得分:0)
当Dataframe
的列不存在时,您可以为df2
添加一列1
来创建临时df1
。然后乘以数据帧,最后使用df1
的索引和列选择输出范围,如下所示:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"Date":["1/1/2017","1/3/2017","1/5/2017","1/6/2017"],
"X":[0.26,0.45,0.73,0.13],
"Y":[0.94,0.63,0.79,0.16],
"Z":[0.22,0.78,0.25,0.31]})
df1["Date"] = pd.to_datetime(df1["Date"])
df1 = df1.set_index(["Date"])
df2 = pd.DataFrame({"Date":["1/1/2017","1/2/2017","1/3/2017","1/4/2017","1/5/2017","1/6/2017"],
"X":[1,np.nan, np.nan, np.nan, 1, 1],
"XX":[np.nan, np.nan, np.nan, 1, 1, 1],
"Y":[1, 1, 1, 1, 1, np.nan],
"Y1":[np.nan, np.nan, np.nan, 1, np.nan, np.nan],
"YY":[np.nan, 1, np.nan, 1, np.nan, 1]})
df2["Date"] = pd.to_datetime(df2["Date"])
df2 = df2.set_index(["Date"])
df2_tmp = df2.copy()
for col in df1.columns:
if col not in df2.columns:
df2_tmp[col] = 1
df_out = df1*df2_tmp
df_out = df_out.loc[df1.index,df1.columns]
所以,如果你输入的是:
>>> df1
X Y Z
Date
2017-01-01 0.26 0.94 0.22
2017-01-03 0.45 0.63 0.78
2017-01-05 0.73 0.79 0.25
2017-01-06 0.13 0.16 0.31
>>> df2
X XX Y Y1 YY
Date
2017-01-01 1.0 NaN 1.0 NaN NaN
2017-01-02 NaN NaN 1.0 NaN 1.0
2017-01-03 NaN NaN 1.0 NaN NaN
2017-01-04 NaN 1.0 1.0 1.0 1.0
2017-01-05 1.0 1.0 1.0 NaN NaN
2017-01-06 1.0 1.0 NaN NaN 1.0
您的输出将是:
>>> df_out
X Y Z
Date
2017-01-01 0.26 0.94 0.22
2017-01-03 NaN 0.63 0.78
2017-01-05 0.73 0.79 0.25
2017-01-06 0.13 NaN 0.31