我知道我可以使用pyarrow.hdfs.connect()
我也知道我可以使用pyarrow.parquet
read_table()
但是,read_table()
接受文件路径,而hdfs.connect()
为我提供HadoopFileSystem
个实例。
是否有可能只使用pyarrow(安装了libhdfs3)来获取驻留在HDFS集群中的镶木地板文件/文件夹?我希望得到的是to_pydict()
函数,然后我可以传递数据。
答案 0 :(得分:4)
尝试
fs = pa.hdfs.connect(...)
fs.read_parquet('/path/to/hdfs-file', **other_options)
或
import pyarrow.parquet as pq
with fs.open(path) as f:
pq.read_table(f, **read_options)
我打开https://issues.apache.org/jira/browse/ARROW-1848关于添加一些关于此
的更明确的文档答案 1 :(得分:1)
我通过Pydoop库和engine = pyarrow尝试了同样的方法,对我来说很完美。这是广义方法。
restapi1 | internal/modules/cjs/loader.js:957
restapi1 | throw err;
restapi1 | ^
restapi1 |
restapi1 | Error: Cannot find module 'validator'
restapi1 | Require stack:
restapi1 | - /app/models/users.js
restapi1 | - /app/database.js
restapi1 | - /app/index.js
restapi1 | at Function.Module._resolveFilename (internal/modules/cjs/loader.js:954:17)
restapi1 | at Function.Module._load (internal/modules/cjs/loader.js:847:27)
restapi1 | at Module.require (internal/modules/cjs/loader.js:1016:19)
restapi1 | at require (internal/modules/cjs/helpers.js:69:18)
restapi1 | at Object.<anonymous> (/app/models/users.js:5:19)
restapi1 | at Module._compile (internal/modules/cjs/loader.js:1121:30)
restapi1 | at Object.Module._extensions..js (internal/modules/cjs/loader.js:1160:10)
restapi1 | at Module.load (internal/modules/cjs/loader.js:976:32)
restapi1 | at Function.Module._load (internal/modules/cjs/loader.js:884:14)
restapi1 | at Module.require (internal/modules/cjs/loader.js:1016:19)
restapi1 | at require (internal/modules/cjs/helpers.js:69:18)
restapi1 | at Object.<anonymous> (/app/database.js:45:1)
restapi1 | at Module._compile (internal/modules/cjs/loader.js:1121:30)
restapi1 | at Object.Module._extensions..js (internal/modules/cjs/loader.js:1160:10)
restapi1 | at Module.load (internal/modules/cjs/loader.js:976:32)
restapi1 | at Function.Module._load (internal/modules/cjs/loader.js:884:14) {
restapi1 | code: 'MODULE_NOT_FOUND',
restapi1 | requireStack: [ '/app/models/users.js', '/app/database.js', '/app/index.js' ]
restapi1 | }
restapi1 | [nodemon] app crashed - waiting for file changes before starting...