在Spark SQL(使用Java API)中,我有一个DataFrame
。
DataFrame
有select
方法。
我想知道这是转型还是行动?
我只需要一个确认和一个很好的参考,明确说明。
答案 0 :(得分:4)
这是转型。请参阅:https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html
数据集是特定于域的对象的强类型集合 可以使用函数或关系并行转换 操作。每个数据集还有一个称为DataFrame的无类型视图, 这是Row的数据集。
数据集上可用的操作分为转换和 动作。转换是产生新数据集的转换 动作是触发计算和返回结果的动作。 示例转换包括map,filter,select和aggregate (通过...分组)。示例操作将数据计数,显示或写入文件 系统
答案 1 :(得分:0)
答案 2 :(得分:-1)
如果执行以下代码,您将能够在控制台中看到输出
import org.apache.spark.sql.SparkSession
object learnSpark2 extends App {
val sparksession = SparkSession.builder()
.appName("Learn Spark")
.config("spark.master", "local")
.getOrCreate()
val range = sparksession.range(1, 500).toDF("numbers")
range.select(range.col("numbers"), range.col("numbers") + 10).show(2)
}
+ ------- + -------------- +
|数字|(数字+ 10)|
+ ------- + -------------- +
| 1 | 11 |
| 2 | 12 |
如果仅执行select而未显示,则执行以下代码,尽管代码已执行,但您将看不到任何输出,这意味着select只是一个转换,不是动作。因此它将不会被评估。
object learnSpark2 extends App {
val sparksession = SparkSession.builder()
.appName("Learn Spark")
.config("spark.master","local")
.getOrCreate()
val range = sparksession.range(1, 500).toDF("numbers")
range.select(range.col("numbers"), range.col("numbers") + 10)
}
在控制台中:
19/01/03 22:46:25 INFO Utils: Successfully started service 'sparkDriver' on port 55531.
19/01/03 22:46:25 INFO SparkEnv: Registering MapOutputTracker
19/01/03 22:46:25 INFO SparkEnv: Registering BlockManagerMaster
19/01/03 22:46:25 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/01/03 22:46:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/01/03 22:46:25 INFO DiskBlockManager: Created local directory at
C:\Users\swilliam\AppData\Local\Temp\blockmgr-9abc8a2c-15ee-4e4f-be04-9ef37ace1b7c
19/01/03 22:46:25 INFO MemoryStore: MemoryStore started with capacity 1992.9 MB
19/01/03 22:46:25 INFO SparkEnv: Registering OutputCommitCoordinator
19/01/03 22:46:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/01/03 22:46:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://10.192.99.214:4040
19/01/03 22:46:26 INFO Executor: Starting executor ID driver on host localhost
19/01/03 22:46:26 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 55540.
19/01/03 22:46:26 INFO NettyBlockTransferService: Server created on 10.192.99.214:55540
19/01/03 22:46:26 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/01/03 22:46:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.192.99.214, 55540, None)
19/01/03 22:46:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.192.99.214:55540 with 1992.9 MB RAM, BlockManagerId(driver, 10.192.99.214, 55540, None)
19/01/03 22:46:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.192.99.214, 55540, None)
19/01/03 22:46:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.192.99.214, 55540, None)
19/01/03 22:46:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/UDEMY/SparkJob/spark-warehouse/').
19/01/03 22:46:26 INFO SharedState: Warehouse path is 'file:/C:/UDEMY/SparkJob/spark-warehouse/'.
19/01/03 22:46:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/01/03 22:46:29 INFO SparkContext: Invoking stop() from shutdown hook
19/01/03 22:46:29 INFO SparkUI: Stopped Spark web UI at http://10.192.99.214:4040
19/01/03 22:46:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/01/03 22:46:29 INFO MemoryStore: MemoryStore cleared
19/01/03 22:46:29 INFO BlockManager: BlockManager stopped
19/01/03 22:46:29 INFO BlockManagerMaster: BlockManagerMaster stopped
19/01/03 22:46:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/01/03 22:46:29 INFO SparkContext: Successfully stopped SparkContext
19/01/03 22:46:29 INFO ShutdownHookManager: Shutdown hook called
19/01/03 22:46:29 INFO ShutdownHookManager: Deleting directory C:\Users\swilliam\AppData\Local\Temp\spark-c69bfb9b-f351-45af-9947-77950b23dd15
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStore="C:\Program Files\SquirrelSQL\certificates\jssecacerts"