Question

我的问题的标题与其他帖子非常相似，但是我在特定示例中没有找到任何内容。我必须将文本文件读为“ $ 1”，然后将值逐行放入数组中。示例：

var df: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], my_schema)
val colNames = Seq("cell", "neighb")
neighborsDf.foreach(row => {
      var rowDf: DataFrame = row.toDF(colNames: _*)
      df.union(rowDf)
    })

我的问题是这种方法行得通吗？

myscript.sh /path/to/file

此代码会将“ path / to / file”视为“ $ 1”，然后将该路径放入变量“ file”中。如果该部分正常工作，我相信第3行应该正确地将行放入数组中？

这是文本文件的内容：

1   #!/bin/bash
2   file="$1"
3   readarray array < file

我希望这是足够的信息来帮助

Answer 1

我使用以下命令将文件的行放置在数组中：

IFS=$'\r\n' GLOBIGNORE='*' command eval  'array=($(<filename))'

这将获取所有列，您以后可以使用它。

编辑：有关上述过程的说明：

IFS = $'\ r \ n'：代表“内部字段分隔符”。外壳使用它来确定如何进行分词，即。 e。如何识别单词边界。
GLOBIGNORE ='*'：在bash的手册页中：用冒号分隔的模式列表，定义了路径名扩展将忽略的文件名集。如果与路径名扩展模式匹配的文件名也与GLOBIGNORE中的模式之一匹配，则将从匹配列表中将其删除。
命令eval ：添加命令eval可使表达式保留在当前执行环境中
array = ... ：简单的定义。

Stackoverflow和Stackexchange上有不同的线程，对此有更多详细信息： https://unix.stackexchange.com/questions/184863/what-is-the-meaning-of-ifs-n-in-bash-scripting https://unix.stackexchange.com/questions/105465/how-does-globignore-work Read lines from a file into a Bash array

然后我像这样循环遍历数组：

for (( b = 0; b < ${#array[@]}; b++ )); do
#Do Somethng
done

这可能是见解。请等待更多评论。

编辑：用例带有空行和小提示

昨天发表评论后。我终于有时间测试建议（空行，带有问题的行）

在两种情况下，与awk一起使用时，数组都可以正常工作。在下面的示例中，我尝试仅将column2打印到新的文本文件中：

IFS=$'\r\n' GLOBIGNORE='*' command eval  'array=($(<'$1'))'
for (( b = 0; b < ${#array[@]}; b++ )); do    
echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt
done

从以下文本文件开始：

290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123





20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123

清除脚本中的空行和glob。执行结果如下：

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123





rockyou
12345678
abc123
*
rockyou
12345678
abc123

清除表明该阵列正在按预期运行的证据。

执行示例：

adama@galactica:~$ ./processing.sh test.txt
adama@galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123





rockyou
12345678
abc123
*
rockyou
12345678
abc123

如果我们希望删除空行（因为我不认为输出中包含空行），我们可以通过更改以下行在awk中进行操作：

echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt

添加/./

echo "${array[b]}" | awk -F "/| " '/./ {print $2}' >> column2.txt

最终结果：

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
rockyou
12345678
abc123
*
rockyou
12345678
abc123

如果您希望将其应用于整个文件（而不是逐列），则可以查看以下线程： AWK remove blank lines

编辑： rm上的安全问题：

我实际上继续将$（rm -rf〜）放在测试文件中，以测试虚拟机上会发生什么：

Test.txt现在的内容：

290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
$(rm -rf ~)





20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123

执行：

adama@galactica:~$ ./processing.sh test.txt
adama@galactica:~$ ll
total 28
drwxr-xr-x 3 adama adama 4096 dic  1 22:41 ./
drwxr-xr-x 3 root  root  4096 dic  1 19:27 ../
drwx------ 2 adama adama 4096 dic  1 22:38 .cache/
-rw-rw-r-- 1 adama adama  144 dic  1 22:41 column2.txt
-rwxr-xr-x 1 adama adama  182 dic  1 22:41 processing.sh*
-rw-r--r-- 1 adama adama  286 dic  1 22:39 test.txt
-rw------- 1 adama adama 1545 dic  1 22:39 .viminfo
adama@galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
-rf




rockyou
12345678
abc123
*
rockyou
12345678
abc123

对系统没有影响。 注意：我正在VM上使用Ubuntu 18.04 x64 LTS。最好不要尝试使用root用户测试安全性问题。

修改：必需set -f：

adama@galactica:~$ ./processing.sh a
adama@galactica:~$ cat column2.txt
[a]
adama@galactica:~$

完美运行，无需set -f

BR

Answer 2

非常接近。：）

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0 needed" >&2; exit 1;; esac

file="$1"
readarray -t array <"$file"

declare -p array >&2 # print the array to stderr for demonstration/proof-of-concept

请注意使用-t的{{1}}参数（以丢弃尾随的换行符），以及使用readarray而不是仅仅使用$file。

如何在Shell脚本中作为命令行参数正确读取文本文件

2 个答案: