我有一个这样的文件:
aaa b b ccc 345
ddd fgt f u 3456
e r der der 5 674
正如您所看到的,我们可以分隔列的唯一方法是查找只有一个或多个空格的列。我们如何识别这些列并使用,
等唯一的分隔符替换它们。
aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674
注意:
如果我们找到所有带有一个或多个空格的连续列(没有别的)并用,
(所有列)替换它们,问题就会得到解决。
josifoski
对问题的更好解释:
每个矩阵字符块,如果所有都是'空格',那么所有块应该在每一行上垂直替换为一个。
答案 0 :(得分:4)
$ cat tst.awk
BEGIN{ FS=OFS=""; ARGV[ARGC]=ARGV[ARGC-1]; ARGC++ }
NR==FNR {
for (i=1;i<=NF;i++) {
if ($i == " ") {
space[i]
}
else {
nonSpace[i]
}
}
next
}
FNR==1 {
for (i in nonSpace) {
delete space[i]
}
}
{
for (i in space) {
$i = ","
}
gsub(/,+/,",")
print
}
$ awk -f tst.awk file
aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674
答案 1 :(得分:1)
awk中的另一个
awk 'BEGIN{OFS=FS=""} # Sets field separator to nothing so each character is a field
FNR==NR{for(i=1;i<=NF;i++)a[i]+=$i!=" ";next} #Increments array with key as character
#position based on whether a space is in that position.
#Skips all further commands for first file.
{ # In second file(same file but second time)
for(i=1;i<=NF;i++) #Loops through fields
if(!a[i]){ #If field is set
$i="," #Change field to ","
x=i #Set x to field number
while(!a[++x]){ # Whilst incrementing x and it is not set
$x="" # Change field to nothing
i=x # Set i to x so it doesnt do those fields again
}
}
}1' test{,} #PRint and use the same file twice
答案 2 :(得分:0)
由于您还标记了此r,因此这是使用R
包readr
的可能解决方案。看起来您想要读取修复宽度文件并将其转换为逗号分隔文件。您可以使用read_fwf
来读取修订宽度文件,使用write_csv
来编写逗号分隔文件。
# required package
require(readr)
# read data
df <- read_fwf(path_to_input, fwf_empty(path_to_input))
# write data
write_csv(df, path = path_to_output, col_names = FALSE)