Question

我在hive中创建了一个视图，它是一个复杂的查询（有连接，联合等）。当我在DF上执行查询时，Catalyst＆amp;钨工作还是100％蜂巢？我想问的是 - 我可以获得视图正在执行的查询，然后使用spark sql来执行查询 - 使用Catalyst＆amp;钨的改进？

示例：

import React, { Component } from 'react';
import './App.css';

//Libraries
import axios from 'axios';

//Components
import SearchBar from './Components/search-bar';
class App extends Component {
  constructor(props){
    super(props);

    this.state = {
      city: 'London',
      country: 'uk',
      temperature: 0,
      humidity: 0,
      pressure: 0
    }
    
    this.citySearch = this.citySearch.bind(this)
  }
  
  componentDidMount() {
    axioSearch();    
  }
  
  axioSearch(city) {
    let city = city || this.state.city;
    let country = this.state.country;
    axios
      .get(`http://api.openweathermap.org/data/2.5/weather?APPID=${API_KEY}&q=${city},${country}`)
      .then(function(response) {
        this.setState({
          city: response.data.name,
          country: response.data.name,
          temperature: response.data.main.temp,
          humidity: response.data.main.humidity,
          pressure: response.data.main.pressure
        });
      }.bind(this))
      .catch(function(error) {
        console.log(error);
      });   
  }

  citySearch(city){
    this.axioSearch(city);
  }

  render() {
    return (
      <div className="container">
        <h1 className="display-1 text-center">Weather App</h1>
        <SearchBar onSearchTermChange={this.citySearch} />
      </div>
    );
  }
}

export default App;

视图查询正在hive（hive上下文）上运行，因而效率不高。

VS

sqlContext.sql("select * from view")

这不是一个数据集，所以我不确定它会更有效率，但我试着弄清楚如何做到这一点。

非常感谢！

Answer 1

当我在DF上执行查询时，Catalyst＆amp;钨工作还是100％蜂巢？

Tungsten成为Spark 1.5的默认设置，可以在早期版本中通过设置spark.sql.tungsten.enabled = true启用（或者在以后的版本中通过将其设置为false来禁用）。即使没有Tungsten，Spark SQL也会使用带有Kryo序列化的柱状存储格式来最大限度地降低存储成本。
Hive查询和Spark Dataframe都使用catalist优化器。

通过此属性spark.sql.tungsten.enabled

看你是否启用了钨

我想问的是 - 我可以查看视图正在执行的查询然后使用spark sql执行查询 - 使用Catalyst＆amp; 钨的改进？

来自代码：请参阅df.explain执行以查看其内部信息。
来自Spark UI ：

Spark 1.5在Web UI中添加了SQL和DataFrame查询计划的可视化，并动态更新了操作指标，例如过滤器操作符的选择性以及聚合和连接的运行时内存使用情况。以下是来自Web UI的计划可视化示例。（source）

在DF上运行的视图上是否有Spark sql查询？

1 个答案: