我需要根据python列表中的可用值在pyspark中动态创建多个数据框
我的数据框(date gender balance
2018-01-01 M 100
2018-02-01 F 100
2018-03-01 M 100
my_list = [2018-01-01, 2018-02-01, 2018-03-01]
for i in my_list:
df_i = df.select("*").filter("date=i").limit(1000)
)有数据:
if (payloadInitAllChart.barDataHour) { // Example, could be other map() method
payloadInitAllChart.barDataHour.map(response =>
state.barChartPerHour.datasetBarHour.push(response)
);
}
能请你帮忙吗?
答案 0 :(得分:0)
我不确定是否可以在PySpark
中动态创建数据帧的名称。在Python中,您甚至无法dynamically分配变量的名称,更不用说dataframes
了。
一种方法是创建dataframes
的字典,其中key
对应于每个date
,而该字典的value
对应于数据帧。
对于Python:请参考此link,那里有人问过类似的名称动力问题。
这是一个小的PySpark
实现-
from pyspark.sql.functions import col
values = [('2018-01-01','M',100),('2018-02-01','F',100),('2018-03-01','M',100)]
df = sqlContext.createDataFrame(values,['date','gender','balance'])
df.show()
+----------+------+-------+
| date|gender|balance|
+----------+------+-------+
|2018-01-01| M| 100|
|2018-02-01| F| 100|
|2018-03-01| M| 100|
+----------+------+-------+
# Creating a dictionary to store the dataframes.
# Key: It contains the date from my_list.
# Value: Contains the corresponding dataframe.
dictionary_df = {}
my_list = ['2018-01-01', '2018-02-01', '2018-03-01']
for i in my_list:
dictionary_df[i] = df.filter(col('date')==i)
for i in my_list:
print('DF: '+i)
dictionary_df[i].show()
DF: 2018-01-01
+----------+------+-------+
| date|gender|balance|
+----------+------+-------+
|2018-01-01| M| 100|
+----------+------+-------+
DF: 2018-02-01
+----------+------+-------+
| date|gender|balance|
+----------+------+-------+
|2018-02-01| F| 100|
+----------+------+-------+
DF: 2018-03-01
+----------+------+-------+
| date|gender|balance|
+----------+------+-------+
|2018-03-01| M| 100|
+----------+------+-------+
print(dictionary_df)
{'2018-01-01': DataFrame[date: string, gender: string, balance: bigint], '2018-02-01': DataFrame[date: string, gender: string, balance: bigint], '2018-03-01': DataFrame[date: string, gender: string, balance: bigint]}