
时间:2020-05-31 09:00:09

标签: pyspark jupyter-lab google-cloud-dataproc


let timings = [{
    isOpen: 1,
    weekday: 1,
    humanDay: "Monday",
    periods: [{
        openDay: "Monday",
        openTime: "12:00",
        closeDay: "Monday",
        closeTime: "14:30",
        openDay: "Monday",
        openTime: "19:00",
        closeDay: "Monday",
        closeTime: "22:30",
        openDay: "Monday",
        openTime: "23:00",
        closeDay: "Monday",
        closeTime: "23:30",
    isOpen: 1,
    weekday: 1,
    humanDay: "Tuesday",
    periods: [{
        openDay: "Tuesday",
        openTime: "12:00",
        closeDay: "Tuesday",
        closeTime: "14:30",
        openDay: "Tuesday",
        openTime: "19:00",
        closeDay: "Tuesday",
        closeTime: "22:30",
        openDay: "Tuesday",
        openTime: "23:00",
        closeDay: "Tuesday",
        closeTime: "23:30",

// create an empty object
const weekdays = {};

timings.forEach((timing) => {
  timing.periods.forEach((period) => {
    // check if the object has a key matching
    // the openTime to CloseTime string
    // (this can be any key, but we want to capture all
    // .. days that have the same open and close times)
    if (!weekdays[`${period.openTime}-${period.closeTime}`]) {
      // the key does not exist, so lets create an new sub-object for 
      // that given key, and prepare its array of days:
      weekdays[`${period.openTime}-${period.closeTime}`] = {
        days: [],
    // now, add the current day to the pre-defined sub-array:
    // also, store the openTime and closeTime as sub-properties, for convenience
    // i know they are stored in the key but the whole purpose of the key
    // is to reduce duplicates by taking advantage of javascript built in
    // funcationalities.

    weekdays[`${period.openTime}-${period.closeTime}`]["openTime"] =

    weekdays[`${period.openTime}-${period.closeTime}`]["closeTime"] =


1 个答案:

答案 0 :(得分:0)

您尝试使用Hadoop BigQuery connector,对于Spark,您应该使用Spark BigQuery connector

要从BigQuery读取数据,您可以遵循an example

# Use the Cloud Storage bucket for temporary BigQuery export data used
# by the connector.
bucket = "[bucket]"
spark.conf.set('temporaryGcsBucket', bucket)

# Load data from BigQuery.
words = spark.read.format('bigquery') \
  .option('table', 'bigquery-public-data:samples.shakespeare') \

# Perform word count.
word_count = spark.sql(
    'SELECT word, SUM(word_count) AS word_count FROM words GROUP BY word')