插入空格以分隔数据库

时间:2017-12-11 11:56:38

标签: database unix awk sed

早上好,我有以下几套,但有更多信息:

215 22221121110110110101 
212 22221121110110110101  
468 22221121110110110101
1200 22221121110110110101 
400 22221121110110110101 
100 22221121110110110101 
200 22221121110110110101

我需要以这种方式将其分成列:

215 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1 
212 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1 
468 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
1200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
400 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
100 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1

我尝试使用简单的sed,但不起作用

  

sed -i -e's // / g'

10 个答案:

答案 0 :(得分:5)

Perl救援!

perl -lane 'push @F, split //, pop @F; print "@F"'
  • -n逐行读取输入行
  • -l从输入中删除换行符并将其添加回输出
  • -a将空格上的每一行拆分为@F数组
  • pop删除数组的最后一个元素并返回它,在这种情况下,它返回第二个"字"
  • split将字符串转换为列表,//将字符串拆分为单个字符
  • pushpop是双重的,它将元素添加到数组的末尾(在这种情况下,它会将单个字符添加到当前仅包含第一列的数组中)
  • 使用双引号打印数组时,默认情况下,成员之间用空格分隔。

答案 1 :(得分:3)

你可以使用GNU awk gensub函数。

gawk '{$2=gensub(/./, "& ", "g", $2)}1' file

答案 2 :(得分:3)

通过其他可以使用此解决方案的解决方案消除行尾的额外空间

$ awk '{print $1 gensub(/./," &","g",$2)}'

答案 3 :(得分:2)

请问您是否可以尝试使用GNU In file included from C:/Program Files (x86)/mingw-w64/i686-7.2.0-posix-dwarf-rt_v5-rev1/mingw32/lib/gcc/i686-w64-mingw32/7.2.0/include/c++/math.h:36:0, from C:\Python27\include/pyport.h:325, Traceback (most recent call last): from C:\Python27\include/Python.h:58, File "C:/Users/gpapanas/PycharmProjects/shapes_segmentation_Project/shapes_segmentation.py", line 8, in <module> from C:\Users\gpapanas\AppData\Local\Theano\compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_45_Stepping_7_GenuineIntel-2.7.14-32\lazylinker_ext\mod.cpp:1: from keras.models import Model C:/Program Files (x86)/mingw-w64/i686-7.2.0-posix-dwarf-rt_v5-rev1/mingw32/lib/gcc/i686-w64-mingw32/7.2.0/include/c++/cmath:1136:11: error: '::hypot' has not been declared File "C:\Python27\lib\site-packages\keras\__init__.py", line 3, in <module> using ::hypot; from . import utils ^~~~~ File "C:\Python27\lib\site-packages\keras\utils\__init__.py", line 6, in <module> from . import conv_utils File "C:\Python27\lib\site-packages\keras\utils\conv_utils.py", line 3, in <module> from .. import backend as K File "C:\Python27\lib\site-packages\keras\backend\__init__.py", line 80, in <module> from .theano_backend import * File "C:\Python27\lib\site-packages\keras\backend\theano_backend.py", line 3, in <module> import theano File "C:\Python27\lib\site-packages\theano\__init__.py", line 66, in <module> from theano.compile import ( File "C:\Python27\lib\site-packages\theano\compile\__init__.py", line 10, in <module> from theano.compile.function_module import * File "C:\Python27\lib\site-packages\theano\compile\function_module.py", line 21, in <module> import theano.compile.mode File "C:\Python27\lib\site-packages\theano\compile\mode.py", line 10, in <module> import theano.gof.vm File "C:\Python27\lib\site-packages\theano\gof\vm.py", line 662, in <module> from . import lazylinker_c File "C:\Python27\lib\site-packages\theano\gof\lazylinker_c.py", line 127, in <module> preargs=args) File "C:\Python27\lib\site-packages\theano\gof\cmodule.py", line 2316, in compile_str (status, compile_stderr.replace('\n', '. '))) Exception: Compilation failed (return status=1): In file included from C:/Program Files (x86)/mingw-w64/i686-7.2.0-posix-dwarf-rt_v5-rev1/mingw32/lib/gcc/i686-w64-mingw32/7.2.0/include/c++/math.h:36:0,. from C:\Python27\include/pyport.h:325,. from C:\Python27\include/Python.h:58,. from C:\Users\gpapanas\AppData\Local\Theano\compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_45_Stepping_7_GenuineIntel-2.7.14-32\lazylinker_ext\mod.cpp:1:. C:/Program Files (x86)/mingw-w64/i686-7.2.0-posix-dwarf-rt_v5-rev1/mingw32/lib/gcc/i686-w64-mingw32/7.2.0/include/c++/cmath:1136:11: error: '::hypot' has not been declared. using ::hypot;. ^~~~~. Process finished with exit code 1 ,并告诉我这是否对您有所帮助。

awk

答案 4 :(得分:2)

使用awk的{​​{3}}

awk '{gsub(/./," &",$2); print $1 $2}' infile

<强>解释

  • gsub(/./,"& ",$2)匹配任何字符(行终止符除外)并将其替换为相同的字符,以及当前记录的第二列中的单个空格。
  

点匹配(几乎)任何角色。在正则表达式中,   点或句点是最常用的元字符之一。   该   dot匹配单个字符,而不关心该字符是什么。   唯一的例外是换行符。

  • 如果替换中出现特殊字符&,则代表与regexp匹配的精确子字符串。

测试结果:

$ cat infile
215 22221121110110110101 
212 22221121110110110101  
468 22221121110110110101
1200 22221121110110110101 
400 22221121110110110101 
100 22221121110110110101 
200 22221121110110110101

$ awk '{gsub(/./," &",$2); print $1 $2}' infile
215 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
212 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
468 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
1200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1 
400 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
100 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1

答案 5 :(得分:1)

速度比较一些答案

$ perl -0777 -ne 'print $_ x 1000000' ip.txt > f1
$ du -h f1
169M    f1

连续两次运行的时间

$ time perl -lane 'push @F, split //, pop @F; print "@F"' f1 > t1
real    0m34.004s
real    0m33.729s

$ time perl -lane 'print join " ",$F[0],split //,$F[1]' f1 > t2
real    0m23.291s
real    0m23.935s

$ time LC_ALL=C awk '{gsub(/./," &",$2); print $1 $2}' f1 > t3
real    0m30.834s
real    0m30.723s


$ diff -s t1 t2
Files t1 and t2 are identical
$ diff -s t1 t3
Files t1 and t3 are identical

答案 6 :(得分:1)

使用bash的另一种方法

while read a b;do
  printf "%s" $a
  while read -n1 c;do
    printf " %c" "$c"
  done<<<$b
  echo
done<lefile

答案 7 :(得分:1)

这可能适合你(GNU sed):

sed 's/ /\n/;h;s/\B/ /g;H;g;s/\n.*\n/ /' file

用换行符替换第一个空格,复制该行,用空格替换所有非单词边界,将更改行附加到副本,然后重新排列该行。

答案 8 :(得分:1)

coreutils

怎么样?
paste -d ''                                \
  <(cut -d' ' -f1 infile                 ) \
  <(cut -d' ' -f2 infile | sed 's/./ &/g')

输出:

215 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
212 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
468 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
1200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
400 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
100 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1
200 2 2 2 2 1 1 2 1 1 1 0 1 1 0 1 1 0 1 0 1

答案 9 :(得分:0)

尝试

sed -i -e 's/\(.\)/\1 /g'

即,逐个字符捕获,然后用自身替换捕获,再加上一个空格。