我在pyspark中有如下数据框
ID Name add date from date end
1 aaa yyyyyy 20-01-2018 30-01-2018
2 bbb ffffff 02-11-2018 15-11-2018
但希望获得如下所示的输出量
ID Name add date from date end
1 aaa yyyyyy 20-01-2018 30-01-2018
1 aaa yyyyyy 21-01-2018 30-01-2018
1 aaa yyyyyy 22-01-2018 30-01-2018
1 aaa yyyyyy 23-01-2018 30-01-2018
1 aaa yyyyyy 24-01-2018 30-01-2018
1 aaa yyyyyy 25-01-2018 30-01-2018
1 aaa yyyyyy 26-01-2018 30-01-2018
1 aaa yyyyyy 27-01-2018 30-01-2018
1 aaa yyyyyy 28-01-2018 30-01-2018
1 aaa yyyyyy 29-01-2018 30-01-2018
1 aaa yyyyyy 30-01-2018 30-01-2018
2 bbb ffffff 02-11-2018 15-11-2018
2 bbb ffffff 03-11-2018 15-11-2018
2 bbb ffffff 04-11-2018 15-11-2018
2 bbb ffffff 05-11-2018 15-11-2018
2 bbb ffffff 06-11-2018 15-11-2018
2 bbb ffffff 07-11-2018 15-11-2018
2 bbb ffffff 08-11-2018 15-11-2018
2 bbb ffffff 09-11-2018 15-11-2018
2 bbb ffffff 10-11-2018 15-11-2018
2 bbb ffffff 11-11-2018 15-11-2018
2 bbb ffffff 12-11-2018 15-11-2018
2 bbb ffffff 13-11-2018 15-11-2018
2 bbb ffffff 14-11-2018 15-11-2018
2 bbb ffffff 15-11-2018 15-11-2018
答案 0 :(得分:0)
尝试一下:
a = [(1,'aaa','yyyyyy','20-01-2018','30-01-2018'),
(2,'bbb','ffffff','02-11-2018','15-11-2018')]
df = spark.createDataFrame(a,["ID","Name","add","date_from","date_end"])
df.show()
+---+----+------+----------+----------+
| ID|Name| add| date_from| date_end|
+---+----+------+----------+----------+
| 1| aaa|yyyyyy|20-01-2018|30-01-2018|
| 2| bbb|ffffff|02-11-2018|15-11-2018|
+---+----+------+----------+----------+
df.registerTempTable("temp")
result = sqlContext.sql("""
select t.ID,
t.Name,
t.add,
date_format(date_add(to_date(t.date_from,'dd-MM-yyyy'),pe.i),'dd-MM-yyyy') as date_from,
t.date_end
from temp t
lateral view posexplode(split(space(datediff(to_date(t.date_end,'dd-MM-yyyy'),to_date(t.date_from,'dd-MM-yyyy'))),' ')) pe as i,x
""")
result.show()
+---+----+------+----------+----------+
| ID|Name| add| date_from| date_end|
+---+----+------+----------+----------+
| 1| aaa|yyyyyy|20-01-2018|30-01-2018|
| 1| aaa|yyyyyy|21-01-2018|30-01-2018|
| 1| aaa|yyyyyy|22-01-2018|30-01-2018|
| 1| aaa|yyyyyy|23-01-2018|30-01-2018|
| 1| aaa|yyyyyy|24-01-2018|30-01-2018|
| 1| aaa|yyyyyy|25-01-2018|30-01-2018|
| 1| aaa|yyyyyy|26-01-2018|30-01-2018|
| 1| aaa|yyyyyy|27-01-2018|30-01-2018|
| 1| aaa|yyyyyy|28-01-2018|30-01-2018|
| 1| aaa|yyyyyy|29-01-2018|30-01-2018|
| 1| aaa|yyyyyy|30-01-2018|30-01-2018|
| 2| bbb|ffffff|02-11-2018|15-11-2018|
| 2| bbb|ffffff|03-11-2018|15-11-2018|
| 2| bbb|ffffff|04-11-2018|15-11-2018|
| 2| bbb|ffffff|05-11-2018|15-11-2018|
| 2| bbb|ffffff|06-11-2018|15-11-2018|
| 2| bbb|ffffff|07-11-2018|15-11-2018|
| 2| bbb|ffffff|08-11-2018|15-11-2018|
| 2| bbb|ffffff|09-11-2018|15-11-2018|
| 2| bbb|ffffff|10-11-2018|15-11-2018|
+---+----+------+----------+----------+
希望这会有所帮助