Question

确定，

一个非常愚蠢的问题......

我在hdfs中有一个大文件

/user/input/foo.txt

我想将此位置的前100行复制到本地文件系统...

而且数据非常敏感，所以我对实验有点敏感。

将样本数据从hdf复制到本地fs的正确方法是什么。

Answer 1

如果文件未压缩：

bin/hadoop fs -cat /path/to/file |head -100 > /path/to/local/file

如果文件已压缩：

bin/hadoop fs -text /path/to/file |head -100 > /path/to/local/file

Answer 2

您可以使用head程序从文件开头提取几行，例如：

$ head /user/input/foo.txt -n100

（其中n确定要提取的行数），并将输出重定向到您选择的文件：

$ head /user/input/foo.txt -n100 > /path/to/you/output/file

Answer 3

这是一种确保胜利的简单方法：

hdfs dfs -copyToLocal /user/input/foo.txt /path/to/local/file | head -100