Question

我正在尝试在多个字段中加入Spark中的两个数据帧。我试过这个：

df1.
   join(df2, df1$col1 == df2$col2 && df1$col3 == df2$col4)

但这不起作用（有一系列错误，如果需要我可以列出）。

有没有更好的方法来写这个？我需要在Spark中执行此操作，而不是pySpark等。

Answer 1

在pyspark中，我必须将包裹条件设置为括号，因为操作优先级有问题。

也许你有同样的问题：

df1.
   join(df2, (df1$col1 == df2$col2) && (df1$col3 == df2$col4))

Answer 2

如果您的数据框为df1和df2，则需要执行

df1.join(df2, (df1("col1") === df2("col2")) && (df1("col3") === df2("col4")))

希望这有帮助！

Answer 3

以下对我有用的

using System.Windows;

namespace ExampleApp
{
    public partial class ServerView : Window
    {
        private readonly ServerViewModel _model;
        public ServerView()
        {
            InitializeComponent();
            _model = new ServerViewModel();
            DataContext = _model;
        }

        private void BtnRefresh_Click(object sender, RoutedEventArgs e)
        {
            _model.ReloadServers();
        }
    }
}

Answer 4

pyspark中的此过程也对我有用。希望这会有所帮助！

df1.join(df2, (df1["col1"]==df2["col2"]) & \
(df1["col3"]==df2["co4"]))

在多个字段上加入两个Spark Dataframe

4 个答案: