无法在foreach流上查找/访问已保存的表

时间:2019-06-26 01:04:08

标签: pyspark azure-databricks delta-lake

我正在尝试将数据框数据保存到表中

sudo sed -i "s/batch_start.*/batch_start 1111/" /tmp/runfile

我设法进行了一些本地测试,并且df.write.saveAsTable适用于静态数据。但是,当我进入流媒体并且尝试在foreach期间保存数据时,由于某种原因,数据不会出现在databricks的“数据”标签中。

我知道它会保存在某个地方,因为当我删除“ append”选项时,它会在一段时间后失败,即在同一位置已经存在另一个具有相同名称的表。

所以我想弄清楚!

  • 在哪里?
  • 如何找到它?
  • 为什么不像静态一样将其保存在“数据”选项卡中的表中 数据吗?

我看到Here

  

”使用指定的LOCATION创建的表被视为不受以下管理   元商店。”

为解决这个问题,我应该执行以下代码:

@objc func handleEdit1()  {
    let editLauncher = EditSongLauncher()
    editLauncher.setupViews()

}

@objc func handleEdit2()  {
    if let window = UIApplication.shared.keyWindow  {
        let editLauncher = EditSongLauncher.init(frame: CGRect(x: 0, y: 0, width: collectionView.frame.width, height: collectionView.frame.height))
        window.addSubview(editLauncher)
    }
}

class EditSongLauncher: UIView, UICollectionViewDataSource, UICollectionViewDelegate, UICollectionViewDelegateFlowLayout {

let cellId = "cellId"

let dimmedView: UIView = {
    let view = UIView()
    view.backgroundColor = UIColor(white: 0, alpha: 0.5)
    view.addGestureRecognizer(UITapGestureRecognizer(target: self, action: #selector(handleSave(sender:))))
    view.isUserInteractionEnabled = true
    return view
}()

let collectionView: UICollectionView = {
    let layout = UICollectionViewFlowLayout()
    let cv = UICollectionView(frame: .zero, collectionViewLayout: layout)
    cv.backgroundColor = .white
    return cv
}()

func setupViews() {

    if let window = UIApplication.shared.keyWindow  {

        let height: CGFloat = 400
        let y = window.frame.height - height
        collectionView.frame = CGRect(x: 0, y: window.frame.height, width: window.frame.width, height: height)

        dimmedView.frame = window.frame
        dimmedView.alpha = 0

        window.addSubview(dimmedView)
        window.addSubview(collectionView)

        UIView.animate(withDuration: 0.5, delay: 0, usingSpringWithDamping: 1, initialSpringVelocity: 1, options: .curveEaseOut, animations: {
            self.dimmedView.alpha = 1
            self.collectionView.frame = CGRect(x: 0, y: y, width: self.collectionView.frame.width, height: self.collectionView.frame.height)
        }, completion: nil)
    }
}

@objc func handleSave(sender: UITapGestureRecognizer)    {
    print("Handling save")
    UIView.animate(withDuration: 0.5) {
        self.dimmedView.alpha = 0
    }
}

override init(frame: CGRect) {
    super.init(frame: frame)

    collectionView.dataSource = self
    collectionView.delegate = self
    collectionView.register(EditSongCell.self, forCellWithReuseIdentifier: cellId)

    setupViews()

}

func collectionView(_ collectionView: UICollectionView, numberOfItemsInSection section: Int) -> Int {
    return 4
}

func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
    let cell = collectionView.dequeueReusableCell(withReuseIdentifier: cellId, for: indexPath)
    cell.backgroundColor = .blue
    return cell
}

func collectionView(_ collectionView: UICollectionView, layout collectionViewLayout: UICollectionViewLayout, sizeForItemAt indexPath: IndexPath) -> CGSize {
    return CGSize(width: frame.width, height: 50)
}

required init?(coder aDecoder: NSCoder) {
    fatalError("init(coder:) has not been implemented")
}
}
  

此功能可用于将数据“导入”到元存储中。

嗯,这没有按我预期的那样工作。经过一些处理后,数据块给了我以下异常:

def SaveData(row):

  ...

  # read csv string
  df = spark.read \
  .option("header", True) \
  .option("delimiter","|") \
  .option("quote", "\"") \
  .option("nullValue", "\\N") \
  .schema(schemaMapping) \
  .csv(csvData)

  df.write.format("delta").mode("append").save(tableLocation)
  #df.write.saveAsTable(tableName)
  #df.saveAsTable(tableName, format='parquet', mode='append')  

query = dfDEHubStream.writeStream.foreach(SaveData).start()

所以我真的很困惑。我在这里想念什么?

0 个答案:

没有答案