Question

我有两个像这样的数据框

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(10, size=(5, 4)), index=list('ABCDE'), columns=list('abcd'))
df2 = pd.DataFrame(np.random.randint(10, size=(2, 4)), index=list('CE'), columns=list('abcd'))

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C  2  4  7  6
D  8  8  1  6
E  7  7  8  1

   a  b  c  d
C  5  9  8  9
E  4  3  0  3

df2的索引始终是df1索引的子集，列名相同。

我想创建第三个数据框df3 = df1 - df2。如果有人这样做，就得到

     a    b    c    d
A  NaN  NaN  NaN  NaN
B  NaN  NaN  NaN  NaN
C -3.0 -5.0 -1.0 -3.0
D  NaN  NaN  NaN  NaN
E  3.0  4.0  8.0 -2.0

我不希望输出中的NAs，而是df1的相应值。有没有一种聪明的方式来使用，例如fillna中未包含df1行中df2的值为sub_ind = df2.index df3 = df1.copy() df3.loc[sub_ind, :] = df1.loc[sub_ind, :] - df2.loc[sub_ind, :]？

解决方法是仅减去所需的行，如：

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C -3 -5 -1 -3
D  8  8  1  6
E  3  4  8 -2

给了我想要的输出

try {  
            sender.Connect(remoteEP);  

            Console.WriteLine("Socket connected to {0}",  
                sender.RemoteEndPoint.ToString());  

            // Encode the data string into a byte array.  
            byte[] msg = Encoding.ASCII.GetBytes("This is a test<EOF>");  

            // Send the data through the socket.  
            int bytesSent = sender.Send(msg);  

            // Receive the response from the remote device.  
            int bytesRec = sender.Receive(bytes);  
            Console.WriteLine("Echoed test = {0}",  
                Encoding.ASCII.GetString(bytes,0,bytesRec));  

            // Release the socket.  
            sender.Shutdown(SocketShutdown.Both);  
            sender.Close();  

        } catch (ArgumentNullException ane) {  
            Console.WriteLine("ArgumentNullException : {0}",ane.ToString());  
        } catch (SocketException se) {  
            Console.WriteLine("SocketException : {0}",se.ToString());  
        } catch (Exception e) {  
            Console.WriteLine("Unexpected exception : {0}", e.ToString());  
        }  

    } catch (Exception e) {  
        Console.WriteLine( e.ToString());  
    }

但也许有一种更简单的方法来实现这一目标？

Answer 1

我认为这就是你想要的：

(df1-df2).fillna(df1)

Out[40]: 
     a    b    c    d
A  5.0  0.0  3.0  3.0
B  7.0  9.0  3.0  5.0
C -3.0 -5.0 -1.0 -3.0
D  8.0  8.0  1.0  6.0
E  3.0  4.0  8.0 -2.0

只需像平常一样减去数据帧，但是＆＃34; package＆＃34;结果使用括号并在结果上运行pandas.DataFrame.fillna方法。或者，更冗长一点：

diff = df1-df2
diff.fillna(df1, inplace=True)

Answer 2

如果您使用sub方法而不是-，则可以传递填充值：

df1.sub(df2, fill_value=0)
Out: 
     a    b    c    d
A  5.0  0.0  3.0  3.0
B  7.0  9.0  3.0  5.0
C -3.0 -5.0 -1.0 -3.0
D  8.0  8.0  1.0  6.0
E  3.0  4.0  8.0 -2.0

Answer 3

以下是使用reindex及其fill_value参数的选项。这个答案与@ ayhan的答案之间的主要区别是：

您只能控制其中一个数据框上的填充值
这可以通过reindex和df1

df2

我们可以更好地控制int数据类型

df1 - df2.reindex(df1.index, fill_value=0)

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C -3 -5 -1 -3
D  8  8  1  6
E  3  4  8 -2

减去行数不等的数据帧

3 个答案: