我有一些第三方Windows软件提供的Unicode / UTF-8文本文件,其中包含大约十列数据。
标题行用制表符分隔。但是,其余各行以空格分隔(不是制表符分隔!)(如在Notepad ++或TextWrangler中打开文件时所见)。
这是文件的前四行(例如): x y z(ns)z(cm)z-abs(cm)经度-纬度-N type_of_object描述 728243.03 5993753.83 0 0 0 143.537779835969 -36.1741232463362 linestart DRIVEWAYGRAVEL 728242.07 5993756.02 0 0 0 143.537768534943 -36.1741037476109线DRIVEWAYGRAVEL 728242.26 5993756.11 0 0 0 143.537770619485 -36.1741028922293 linestart DRIVEWAYGRAVEL
x y z(ns) z(cm) z-abs(cm) longitude- E latitude- N type_of_object description
728243.03 5993753.83 0 0 0 143.537779835969 -36.1741232463362 linestart DRIVEWAYGRAVEL
728242.07 5993756.02 0 0 0 143.537768534943 -36.1741037476109 line DRIVEWAYGRAVEL
728242.26 5993756.11 0 0 0 143.537770619485 -36.1741028922293 linestart DRIVEWAYGRAVEL
(n.b。每行开头的空格,标题行除外)
我正试图编写一个Bash脚本来重新格式化数据,以导入到另一个Windows程序中。
(我知道我可以在Windows命令行上执行此操作,但是我对此没有经验,因此宁愿将文件复制到我的Debian机器上并在Bash中创建脚本。这意味着输入文件和输出文件需要与Windows兼容,但是脚本本身显然可以在Linux中运行。)
我需要执行以下操作:
因此,输出文件应如下所示:
728257.89,5993759.24,1
728254.83,5993758.54,0
728251.82,5993762.4,0
728242.45,5993765.07,0
我尝试了the answer to this question。 例如
awk '
NR==1{
for(i=1;i<=NF;i++)
if($i!="z(ns)")
cols[i]
}
{
for(i=1;i<=NF;i++)
if(i in cols)
printf "%s ",$i
printf "\n"
}' input.file > output.file
...删除第三列(然后对此进行变体以消除其他不需要的列)。但是,我剩下的只是一个空的输出文件。
我还尝试使用grep和awk一起破解一个解决方案:
touch output.txt
count=0
IFS=$'\n'
set -f #disable globbing
for i in $( grep "rectangle" $inputFile )
do
Xcoord=$(awk 'BEGIN { FS=" " } { print $1 }' $i )
printf "$Xcoord" >> output.txt
echo ","
Ycoord=$(awk 'BEGIN { FS=" " } { print $2 }' $i )
printf "$Ycoord" >> output.txt
printf ","
count=$((count+1))
if [[ count = "1" ]]
then
printf "$count\n" >> output.txt
else
printf "0\n" >> output.txt
fi
done
set +f #re-enable globbing for future use of the terminal.
...这背后的想法是: -对于$ inputFile中包含“矩形”的每一行
1. Append the first column (variable "Xcoord") to output.txt
2. Append a comma to output.txt
3. Append the second column (variable "Ycoord") to output.txt
4. Append another comma to output.txt
5. Append the 1 or 0 as per the if test based on the value of the variable "count", along with a new line.
这个想法失败了。它没有将数据保存到文件中,而是将文件的所有列打印到stdout,第一列替换为文本“(没有这样的文件或目录)”:
...并且output.txt几乎是零:
预先感谢...
答案 0 :(得分:1)
我认为awk可以在一行中满足您的所有需求:
awk -F '[[:space:]][[:space:]]+' 'BEGIN{OFS = ","} {if ($8 == "rectangle") print $1, $2 }' a.txt | awk 'BEGIN{OFS = ","}{if((NR+3)%4) print $0,0;else print $0,1}'
您通过
将条目之间的定界符设置为“ at least two spaces”-F '[[:space:]][[:space:]]+
将输出分隔符设置为
'BEGIN{OFS = ","}
在最后第二列中检查矩形条件
if ($8 == "rectangle")
并打印您要作为输出的列
print $1, $2
要在第三输出列中添加0,1模式,必须重新启动awk以获取结果文件的行号,而不是原始输入行。 awk NR变量包含从1开始的行号。
(NR+3)%4
对于行号1,5,9,(% is modulo-operation)结果为0(= false)... 因此,您只需打印完整的行(变量$ 0),然后在if情况下打印0,在else情况下打印1。
print $0,0;else print $0,1
希望这就是您想要的。
答案 1 :(得分:0)
我想出了一个解决办法。
#!/bin/bash
#Code here to retrieve the file from command arguments and set it as $inputFile (removed for brevity)
sed -i 1d $inputFile #Remove header line
sed 's/^ *//g' < $inputFile > work.txt #Remove first character in each line (a space).
tr -s ' ' <work.txt | tr ' ' ',' >work2.txt #Switch spaces for commas.
grep "rectangle" work2.txt > work3.txt #Print all lines containing "rectangle" in them to new file.
rm lineout.txt #Delete output file in case script was run previously.
touch lineout.txt
count=0
while IFS='' read -r line || [[ -n "$line" ]]; do
printf "$line" > line.txt
awk 'BEGIN { FS="," } { printf $1 >> "lineout.txt" }' line.txt
printf "," >> lineout.txt
awk 'BEGIN { FS="," } { printf $2 >> "lineout.txt" }' line.txt
printf "," >> lineout.txt
count=$((count + 1))
if [[ $count = "1" ]]
then
printf "$count\n" >> lineout.txt
else
printf "0\n" >> lineout.txt
if [[ $count = "4" ]]
then
count=0
fi
fi
done < work3.txt
答案 2 :(得分:0)
可以使用具有以下功能的高级文本编辑器轻松设置其格式:
我并不是想宣传崇高的想法,但是这个工具肯定可以解决我的大多数文本编辑问题。