基于来自其他数据帧的现有字段的连接获取数据帧

时间:2017-09-06 11:03:57

标签: postgresql apache-spark apache-spark-sql

大家好我拥有以下结构的数据帧,我想做的是从表中获取数据,其中SITE应该在pos中,FRN应该是供应商数据帧的ID,CODA应该在表中文章

#include "xdr_types.h"

bool_t
xdr_tagtype (XDR *xdrs, tagtype *objp)
{
    register int32_t *buf;

    if (!xdr_enum (xdrs, (enum_t *) objp))
         return FALSE;
    return TRUE;
}

bool_t
xdr_file (XDR *xdrs, file *objp)
{
    register int32_t *buf;

    if (!xdr_bytes (xdrs, (char **)&objp->contents.contents_val, (u_int *) &objp->contents.contents_len, ~0))
        return FALSE;
    if (!xdr_u_int (xdrs, &objp->last_mod_time))
        return FALSE;
    return TRUE;
}


bool_t
xdr_filename (XDR *xdrs, filename *objp)
{
    register int32_t *buf;

    if (!xdr_string (xdrs, objp, 256))
         return FALSE;
    return TRUE;
}

bool_t
xdr_message (XDR *xdrs, message *objp)
{
     register int32_t *buf;

     if (!xdr_tagtype (xdrs, &objp->tag))
         return FALSE;

    switch (objp->tag) {
        case GET:
            if (!xdr_array (xdrs, (char **)&objp->message_u.filenamedata.filenamedata_val, (u_int *) &objp->message_u.filenamedata.filenamedata_len, 10,
            sizeof (filename), (xdrproc_t) xdr_filename))
            return FALSE;
            break;
        case OK:
            if (!xdr_array (xdrs, (char **)&objp->message_u.fdata.fdata_val, (u_int *) &objp->message_u.fdata.fdata_len, 10, sizeof (file), (xdrproc_t) xdr_file))
             return FALSE;
            break;
        case QUIT:
            break;
        case ERR:
            break;
        default:
            return FALSE;
    }
    return TRUE;
}

这里我做了什么我仍然得到外键问题,最终的数据帧将插入一个postgresql表,其中包含三个外键SITE,FRN和CODA

 histoachat.printSchema()
        pos.printSchema()
        supplier.printSchema()
        article.printSchema()
        pos.printSchema()
    root
     |-- SITE: decimal(5,0) (nullable = false)
     |-- FRN: string (nullable = true)
     |-- CODA: string (nullable = true)
     |-- QT: decimal(38,0) (nullable = true)
     |-- PB: decimal(38,0) (nullable = true)
     |-- REMI: decimal(38,0) (nullable = true)
     |-- PNETTVA: decimal(38,0) (nullable = true)
     |-- SCH: decimal(38,0) (nullable = true)

    root
     |-- id: long (nullable = false)

    root
     |-- id: long (nullable = false)

    root
     |-- id: long (nullable = false)

    root
     |-- id: long (nullable = false)

任何帮助谢谢 这是我得到的问题

非常感谢

1 个答案:

答案 0 :(得分:0)

我有一个昂贵的解决方案,即使用join

val finalHistoachat = histoachat.as("histo").join(pos.as("pos"), $"histo.SITE" === $"pos.id")
                                                .drop($"pos.id")
                                                .as("joined1")
                                                .join(supplier.as("sup"), $"joined1.FRN" === $"sup.id")
                                                .drop($"sup.id")
                                                .as("joined2")
                                                .join(article.as("art"), $"joined2.CODA" === $"art.id")
                                                .drop($"art.id")

我希望答案是有帮助的,直到你得到一些简单的答案,这个答案可以帮助你得到你想要的最终数据框