Question

我正在使用python和pandas在hadoop中编写一个map reduce。在我的reducer中，我正在调用一个函数并正在读取其中的文件。但是减速器给出了以下误差 -

IOError: File /path/subset_clean.tsv does not exist

以下是我使用的代码 -

file_loc='path/subset_clean.tsv'
def func(list_app):
  headers=pd.read_csv(file_loc, sep='\t', low_memory=False,nrows=1)
  headers_name=list(headers)
  subset=pd.DataFrame(list_app,columns=headers_name)
  .
  .
  return;

reducer code
 call func(list_app)

list_app是从reducer传递给函数的列表。

但是，当我不读取文件，并将变量定义如下时，它的工作原理非常好 - headers_name =（＆＃39; col1＆＃39;，＆＃39; col2＆＃39;，＆＃39; col3＆＃39;）

我无法解决此错误。

编辑 - 当我使用下面的命令运行相对路径的前一个代码时，它完全正常 -

cat sample_reduce_33016_n.tsv|python mapper_try.py|sort|python reducer_new_v1.py>sample_red_reslt.tsv

Python，Hadoop：IOError：文件不存在，但文件存在于路径中

0 个答案: