我有以下数据集
+---------+-------------+-----------+-----------+------+
|component|sub_component|submit_date|closed_date| state|
+---------+-------------+-----------+-----------+------+
| A| A1| 2019-01-13| null| open|
| A| A2| 2019-01-01| 2019-03-10|closed|
| A| A3| 2019-03-13| 2019-04-01|closed|
| A| A4| 2019-03-01| 2019-03-31|closed|
| A| A4| 2018-08-01| 2019-09-31|closed|
| A| A4| 2018-12-01| 2019-01-31|closed|
| B| B1| 2019-03-13| null| open|
| B| B2| 2019-03-01| null| open|
| B| B3| 2019-03-13| 2019-06-01|closed|
| B| B4| 2019-03-01| 2019-05-31|closed|
| B| B2| 2018-04-01| 2018-04-31|closed|
| C| c1| 2019-03-13| null| open|
| C| C2| 2019-01-15| 2019-01-26|closed|
| C| C3| 2019-01-26| 2019-02-01|closed|
| C| C4| 2019-03-01| 2019-03-31|closed|
| C| C5| 2019-01-01| 2019-03-31|closed|
| C| C8| 2017-01-01| 2017-03-31|closed|
| D| D1| 2019-06-13| null| open|
| D| D2| 2019-03-01| null| open|
| D| D3| 2019-03-13| 2019-06-01|closed|
+---------+-------------+-----------+-----------+------+
我来自SQl背景,试图写Qurie来查询时间序列响应
spark.sql("
select concat_ws('--', substring(submit_date, 1, 7), substring(closed_date, 1, 7)
) as submit_date,
count(*) as total_count
from data
where submit_date >= '2018-01-13' and
submit_date <= '2019-04-31' and
closed_date <= '2019-04-31'
group by substring(submit_date, 1, 7),
substring(closed_date, 1, 7)
order by submit_date
").show()
给出如下所示的响应
+----------------+-----------+
| submit_date|total_count|
+----------------+-----------+
|2018-04--2018-04| 1|
|2018-12--2019-01| 1|
|2019-01--2019-01| 1|
|2019-01--2019-02| 1|
|2019-01--2019-03| 2|
|2019-03--2019-03| 2|
|2019-03--2019-04| 1|
+----------------+-----------+
请帮助我获得相同的mongodb要求