我正在尝试在一次加载中加载多个文件。它们都是分区文件 当我尝试使用1个文件时,它可以工作,但是当我列出24个文件时,它给了我这个错误,我找不到任何限制的文档,除了在加载后进行联合之外还有一个解决方法。 还有其他选择吗?
代码下面重新创建问题:
basepath = '/file/'
paths = ['/file/df201601.orc', '/file/df201602.orc', '/file/df201603.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc', ]
df = sqlContext.read.format('orc') \
options(header='true',inferschema='true',basePath=basePath)\
.load(*paths)
收到错误:
TypeError Traceback (most recent call last)
<ipython-input-43-7fb8fade5e19> in <module>()
---> 37 df = sqlContext.read.format('orc') .options(header='true', inferschema='true',basePath=basePath) .load(*paths)
38
TypeError: load() takes at most 4 arguments (24 given)
答案 0 :(得分:2)
正如the official documentation中所述,要阅读多个文件,您应该传递list
:
路径 - 可选字符串或文件系统支持的数据源的字符串列表。
所以在你的情况下:
(sqlContext.read
.format('orc')
.options(basePath=basePath)
.load(path=paths))
只有在使用可变参数定义*
时,参数解包(load
)才有意义,例如:
def load(this, *paths):
...