查找文本文件中仅包含空格的列,并使用唯一的分隔符替换它们

时间:2015-06-16 13:12:28

标签: regex r bash awk sed

我有一个这样的文件:

aaa  b b ccc      345
ddd  fgt f u      3456
e r  der der      5 674

正如您所看到的,我们可以分隔列的唯一方法是查找只有一个或多个空格的列。我们如何识别这些列并使用,等唯一的分隔符替换它们。

aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674

注意:
如果我们找到所有带有一个或多个空格的连续列(没有别的)并用,(所有列)替换它们,问题就会得到解决。

josifoski对问题的更好解释: 每个矩阵字符块,如果所有都是'空格',那么所有块应该在每一行上垂直替换为一个。

3 个答案:

答案 0 :(得分:4)

$ cat tst.awk
BEGIN{ FS=OFS=""; ARGV[ARGC]=ARGV[ARGC-1]; ARGC++ }
NR==FNR {
    for (i=1;i<=NF;i++) {
        if ($i == " ") {
            space[i]
        }
        else {
            nonSpace[i]
        }
    }
    next
}
FNR==1 {
    for (i in nonSpace) {
        delete space[i]
    }
}
{
    for (i in space) {
        $i = ","
    }
    gsub(/,+/,",")
    print
}

$ awk -f tst.awk file
aaa,b b,ccc,345
ddd,fgt,f u,3456
e r,der,der,5 674

答案 1 :(得分:1)

awk中的另一个

awk 'BEGIN{OFS=FS=""}  # Sets field separator to nothing so each character is a field

FNR==NR{for(i=1;i<=NF;i++)a[i]+=$i!=" ";next}  #Increments array with key as character 
                                  #position based on whether a space is in that position.
                                  #Skips all further commands for first file.
     {                            # In second file(same file but second time)
        for(i=1;i<=NF;i++)        #Loops through fields
           if(!a[i]){             #If field is set
              $i=","              #Change field to ","
              x=i                 #Set x to field number
              while(!a[++x]){     # Whilst incrementing x and it is not set
                 $x=""            # Change field to nothing
                 i=x              # Set i to x so it doesnt do those fields again
              }
           }
      }1' test{,} #PRint and use the same file twice

答案 2 :(得分:0)

由于您还标记了此,因此这是使用Rreadr的可能解决方案。看起来您想要读取修复宽度文件并将其转换为逗号分隔文件。您可以使用read_fwf来读取修订宽度文件,使用write_csv来编写逗号分隔文件。

# required package
require(readr)
# read data
df <- read_fwf(path_to_input, fwf_empty(path_to_input))
# write data
write_csv(df, path = path_to_output, col_names = FALSE)