Question

我有几个excel文件。这两个文件都有两个公共列：Customer_Name和Customer_No。第一个excel文件有大约800k行，而第二个只有460.我想得到一个数据帧，其中包含两个文件中的公共数据，即从第一个同时具有Customer_Name和Customer_No的文件中获取行。在第二个文件中找到。我尝试使用.isin，但到目前为止，我发现仅使用单个变量（Column）的示例。提前谢谢！

Answer 1

使用merge：

df = pd.merge(df1, df2, on=['Customer_Name','Customer_No'])

如果您有不同的列名，请使用left_on和right_on：

df = pd.merge(df1, 
              df2, 
              left_on=['Customer_Name','Customer_No'], 
              right_on=['Customer_head','Customer_Id'])

Answer 2

IIUC并且您不需要第二个文件中的额外列 - 它将仅用于加入，您可以这样做：

<html>
  <head></head>
  <body>
    Hello
    <script>
      window.onbeforeunload = function(e) {
        document.body.innerHTML = "Goodbye";
        var dialogText = 'Dialog text here';
        e.returnValue = dialogText;
        return dialogText;
      };
    </script>
  </body>
</html>

Answer 3

我认为直接的方式是这样的：

df_file1 = pd.read_csv(file1, index_col) # set Customer_No
df_file2 = pd.read_csv(file2, index_col) # set Customer_No
for index, row in df_file1.iterrows():
    if row.get_value('Customer_name) in df_file2['Customer_name'].values:

在这里你可以简单地用整数来计算，或者产生一些复杂的工作，例如将[index，row]添加到结果df中，如果需要的话。

比较Pandas中的两个excel文件，并返回两列

3 个答案: