列应用于熊猫

时间:2017-10-27 18:42:04

标签: python pandas

我试图将两个pandas data_frames与不同数量的列相乘,我想得到第一个data_frame形状的结果data_frame。即,如果行和列匹配两个单元格的多个,则保持该值与第一个数据帧中的值相同。以下示例: 什么是最有效的矩阵形式方法,不需要for循环?

谢谢!

DF1:

enter image description here

DF2:

enter image description here

得到的df(df_result = df1 * df2):

enter image description here

4 个答案:

答案 0 :(得分:4)

选项1
使用pd.DataFrame.align

pd.DataFrame.mul(*df1.align(df2, 'left', fill_value=1))

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31

选项2
使用pd.DataFrame.reindex

df1 * df2.reindex(df1.index, df1.columns, fill_value=1)

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31

选项3
使用pd.DataFrame.mask
根据评论者@CedricZoppolo的建议:
警告:这是假设1的值旨在标记有效位置,如掩码。这不会使这些值相乘。如果打算真正乘以值,则不要使用此选项。

df1.mask(df2.isnull().reindex_like(df1).fillna(False))

             X     Y     Z
1/1/2017  0.26  0.94  0.22
1/3/2017   NaN  0.63  0.78
1/5/2017  0.73  0.79  0.25
1/6/2017  0.13   NaN  0.31

设置

from numpy import nan as NA

df1 = pd.DataFrame(dict(
    X=[0.26, 0.45, 0.73, 0.13],
    Y=[0.94, 0.63, 0.79, 0.16],
    Z=[0.22, 0.78, 0.25, 0.31]
), ['1/1/2017', '1/3/2017', '1/5/2017', '1/6/2017'])

df2 = pd.DataFrame(dict(
    X=[1, NA, NA, NA, 1, 1],
    XX=[NA, NA, NA, 1, 1, 1],
    Y=[1, 1, 1, 1, 1, NA],
    Y1=[NA, NA, NA, 1, NA, NA],
    YY=[NA, 1, NA, 1, NA, 1]
), ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'], dtype=object)

答案 1 :(得分:0)

没有单行解决方案,但以下情况应该有效,不涉及任何循环。

public.heic

答案 2 :(得分:0)

找到共享列,然后使用此方法对两个数据帧进行切片,然后乘以:

In [47]: df1
Out[47]:
   X  Y  Z
0  1  2  3
1  4  5  6
2  7  8  9

In [48]: df2
Out[48]:
   X  XX   Y  Y1  YY
0  1   2 NaN   4 NaN
1  4   5 NaN   4   5
2  7   8   9   2   3

In [49]: shared_cols = [col for col in df1.columns if col in df2.columns]

In [50]: shared_cols
Out[50]: ['X', 'Y']

In [51]: df1[shared_cols] * df2[shared_cols]
Out[51]:
    X   Y
0   1 NaN
1  16 NaN
2  49  72

答案 3 :(得分:0)

Dataframe的列不存在时,您可以为df2添加一列1来创建临时df1。然后乘以数据帧,最后使用df1的索引和列选择输出范围,如下所示:

import numpy as np
import pandas as pd
df1 = pd.DataFrame({"Date":["1/1/2017","1/3/2017","1/5/2017","1/6/2017"],
"X":[0.26,0.45,0.73,0.13],
"Y":[0.94,0.63,0.79,0.16],
"Z":[0.22,0.78,0.25,0.31]})
df1["Date"] = pd.to_datetime(df1["Date"])
df1 = df1.set_index(["Date"])

df2 = pd.DataFrame({"Date":["1/1/2017","1/2/2017","1/3/2017","1/4/2017","1/5/2017","1/6/2017"],
"X":[1,np.nan, np.nan, np.nan, 1, 1],
"XX":[np.nan, np.nan, np.nan, 1, 1, 1],
"Y":[1, 1, 1, 1, 1, np.nan],
"Y1":[np.nan, np.nan, np.nan, 1, np.nan, np.nan],
"YY":[np.nan, 1, np.nan, 1, np.nan, 1]})
df2["Date"] = pd.to_datetime(df2["Date"])
df2 = df2.set_index(["Date"])

df2_tmp = df2.copy()
for col in df1.columns:
    if col not in df2.columns:
        df2_tmp[col] = 1
df_out = df1*df2_tmp
df_out = df_out.loc[df1.index,df1.columns]

所以,如果你输入的是:

>>> df1
               X     Y     Z
Date                        
2017-01-01  0.26  0.94  0.22
2017-01-03  0.45  0.63  0.78
2017-01-05  0.73  0.79  0.25
2017-01-06  0.13  0.16  0.31
>>> df2
              X   XX    Y   Y1   YY
Date                               
2017-01-01  1.0  NaN  1.0  NaN  NaN
2017-01-02  NaN  NaN  1.0  NaN  1.0
2017-01-03  NaN  NaN  1.0  NaN  NaN
2017-01-04  NaN  1.0  1.0  1.0  1.0
2017-01-05  1.0  1.0  1.0  NaN  NaN
2017-01-06  1.0  1.0  NaN  NaN  1.0

您的输出将是:

>>> df_out
               X     Y     Z
Date                        
2017-01-01  0.26  0.94  0.22
2017-01-03   NaN  0.63  0.78
2017-01-05  0.73  0.79  0.25
2017-01-06  0.13   NaN  0.31