我的数据集如下
Dataset 1:
+----------+--------------------+---------+---+
| Time| address| Date|value|sample
+----------+--------------------+---------+---+------+
|8:00:00 AM| AAbbbbbbbbbbbbbbbb|12/9/2014| 1 |0 |
|8:31:27 AM| AAbbbbbbbbbbbbbbbb|12/9/2014| 1 |0 |
+----------+--------------------+---------+---+------+
Dataset 2:
| Time| Location| Date|sample|value
+-----------+--------------------+---------+------+------+
| 8:45:00 AM| AAbbbbbbbbbbbbbbbb|12/9/2016| 5 | 0 |
| 9:15:00 AM| AAbbbbbbbbbbbbbbbb|12/9/2016| 5 | 0 |
+-----------+--------------------+---------+------+------+
我正在使用以下unionAll()函数来组合ds1和ds2,
Dataset<Row> joined = dataset1.unionAll(dataset2).distinct();
有没有更好的方法来组合这个ds1和ds2,因为在Spark 2.x中不推荐使用unionAll()函数。?
答案 0 :(得分:1)
您可以使用#define BINARY_FILE_NAME_MAXLEN 10
typedef struct _prv_instance_
{
/*
* The first two are mandatories and represent the pointer to the next instance and the ID of this one. The rest
* is the instance scope user data (uint8_t power in this case)
*/
struct _prv_instance_ * next; // matches lwm2m_list_t::next
uint16_t shortID; // matches lwm2m_list_t::id
uint8_t power;
uint8_t reset;
double dec;
char binary_filename[BINARY_FILE_NAME_MAXLEN];
} prv_instance_t;
合并两个数据框/数据集
union()
输出:
df1.union(df2)
它还会删除重复的行
希望这有帮助!