Question

我在同一Google Cloud平台上具有hdfs集群和python。我想从python访问hdfs集群中存在的文件。我发现使用pydoop可以做到这一点，但我可能难以为它提供正确的参数。以下是到目前为止我尝试过的代码：-

public ResponseEntity<Cricketer> updateCricketer(
        @PathVariable("id") Long id, 
        @RequestBody @Valid Cricketer cricketer) {

    @Nullable Cricketer cCricketer=cricketerService.findById(id);

    if (cricketer == null) {
        return new ResponseEntity<>(null, HttpStatus.NOT_FOUND);
    }

    cCricketer.setId(id);
    cCricketer.setCountry(cricketer.getCountry());
    cCricketer.setName(cricketer.getName());
    cCricketer.setHighestScore(cricketer.getHighestScore());
    cricketerRepository.save(cCricketer);

    return new ResponseEntity<Cricketer>(cCricketer, HttpStatus.OK);
}

出现此错误：-

import pydoop.hdfs as hdfs
import pydoop

pydoop.hdfs.hdfs(host='url of the file system goes here',
                 port=9864, user=None, groups=None)

"""
 class pydoop.hdfs.hdfs(host='default', port=0, user=None, groups=None)

    A handle to an HDFS instance.

    Parameters

            host (str) – hostname or IP address of the HDFS NameNode. Set to an empty string (and port to 0) to connect to the local file system; set to 'default' (and port to 0) to connect to the default (i.e., the one defined in the Hadoop configuration files) file system.

            port (int) – the port on which the NameNode is listening

            user (str) – the Hadoop domain user name. Defaults to the current UNIX user. Note that, in MapReduce applications, since tasks are spawned by the JobTracker, the default user will be the one that started the JobTracker itself.

            groups (list) – ignored. Included for backwards compatibility.


"""

#print (hdfs.ls("/vs_co2_all_2019_v1.csv"))

如果我执行以下代码：-

RuntimeError: Hadoop config not found, try setting HADOOP_CONF_DIR

什么都没有发生。但是此“ vs_co2_all_2019_v1.csv”文件确实存在于群集中，但是当我拍摄屏幕快照时此文件不可用。

我的hdfs屏幕截图如下所示：

和我拥有的凭据如下所示：

有人可以告诉我我在做什么错吗？我需要将哪些凭证放在pydoop api中？或者，也许还有另一种更简单的方法可以解决此问题，我们将不胜感激！

从pydoop访问hdfs集群

0 个答案: