Question

我有一个这样的文件：

aaa  b b ccc      345
ddd  fgt f u      3456
e r  der der      5 674

正如您所看到的，我们可以分隔列的唯一方法是查找只有一个或多个空格的列。我们如何识别这些列并使用,等唯一的分隔符替换它们。

aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674

注意：
如果我们找到所有带有一个或多个空格的连续列（没有别的）并用,（所有列）替换它们，问题就会得到解决。

josifoski对问题的更好解释：每个矩阵字符块，如果所有都是'空格'，那么所有块应该在每一行上垂直替换为一个。

Answer 1

$ cat tst.awk
BEGIN{ FS=OFS=""; ARGV[ARGC]=ARGV[ARGC-1]; ARGC++ }
NR==FNR {
    for (i=1;i<=NF;i++) {
        if ($i == " ") {
            space[i]
        }
        else {
            nonSpace[i]
        }
    }
    next
}
FNR==1 {
    for (i in nonSpace) {
        delete space[i]
    }
}
{
    for (i in space) {
        $i = ","
    }
    gsub(/,+/,",")
    print
}

$ awk -f tst.awk file
aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674

Answer 2

awk中的另一个

awk 'BEGIN{OFS=FS=""}  # Sets field separator to nothing so each character is a field

FNR==NR{for(i=1;i<=NF;i++)a[i]+=$i!=" ";next}  #Increments array with key as character 
                                  #position based on whether a space is in that position.
                                  #Skips all further commands for first file.
     {                            # In second file(same file but second time)
        for(i=1;i<=NF;i++)        #Loops through fields
           if(!a[i]){             #If field is set
              $i=","              #Change field to ","
              x=i                 #Set x to field number
              while(!a[++x]){     # Whilst incrementing x and it is not set
                 $x=""            # Change field to nothing
                 i=x              # Set i to x so it doesnt do those fields again
              }
           }
      }1' test{,} #PRint and use the same file twice

Answer 3

由于您还标记了此r，因此这是使用R包readr的可能解决方案。看起来您想要读取修复宽度文件并将其转换为逗号分隔文件。您可以使用read_fwf来读取修订宽度文件，使用write_csv来编写逗号分隔文件。

# required package
require(readr)
# read data
df <- read_fwf(path_to_input, fwf_empty(path_to_input))
# write data
write_csv(df, path = path_to_output, col_names = FALSE)

查找文本文件中仅包含空格的列，并使用唯一的分隔符替换它们

3 个答案: