Question

我想在第二列的单词transcript_id之后添加一个空格 “Row.names”“id”gene_id.x“

 "1"    transcript_id"TCONS_00000008"   "XLOC_000004"
 "2"    transcript_id"TCONS_00000015"   "XLOC_000005"
 "3"    transcript_id"TCONS_00000033"   "XLOC_000008"
 "4"    transcript_id"TCONS_00000037"   "XLOC_000008"
 "5"    transcript_id"TCONS_00000039"   "XLOC_000008"

并尝试了这个

sed 's/./& /17' file.out > files.out

它起作用，结果看起来像这样

“Row.names”“id”gene_id.x“

 "1"    transcript_id "TCONS_00000008"  "XLOC_000004"
 "2"    transcript_id "TCONS_00000015"  "XLOC_000005"
 "3"    transcript_id "TCONS_00000033"  "XLOC_000008"
 "4"    transcript_id "TCONS_00000037"  "XLOC_000008"
 "5"    transcript_id "TCONS_00000039"  "XLOC_000008"

但是当我使用

检查第二列时

 awk '{ print $2 }'  files.out

我只得到

transcript_id
transcript_id
transcript_id
transcript_id
transcript_id

例如，我想在一列中使用transcript_id“TCONS_00000008”，而不是将它们分成2列和第3列。

Answer 1

awk 中的默认字段分隔符在一个或多个标签或空格上匹配;因此， sed 调用具有创建附加列的效果。您可以改变此行为：

awk -F'  +' '{ print $2 }' files.out

这会更改字段分隔符以匹配两个或多个空格。如果还要在选项卡上匹配，可以按如下方式更改字段分隔符正则表达式：

awk -F'  +|[\t]+' '{ print $2 }' files.out

在不调用 sed 的情况下实现结果：

awk '{ x=$2; sub(/"/, " \"", x); print x }' file.out

Answer 2

如果您不担心让其他人感到困惑，您可以使用不间断的空间。例如

$ sed 's/_id/&\xA0/' file | awk '{print $2}'

transcript_id "TCONS_00000008"
transcript_id "TCONS_00000015"
transcript_id "TCONS_00000033"
transcript_id "TCONS_00000037"
transcript_id "TCONS_00000039"

但是，更好的方法是定义一个字段分隔符，它与您在字段中使用的字符不同（在视觉上）。

Answer 3

如果你＆＃34;插入空格＆＃34;在一个字段中，该字段将被awk拆分在该空间上这就是你所经历的。

要使用已清理的文件，我们需要将源文件过滤到某个临时测试文件（所有空格和标签都替换为字段之间的唯一空格）：

sed -e 's/^[ \t]\+//1' -e 's/[ \t]\+/ /g' originalfile >file.tmp

如果原始文件太大，请使用20或50行。

然后，你要么：

选择为awk使用其他分隔符（不是默认值：空格）。你可以
- 在两个或更多空格的运行中拆分awk中的字段：FS =＆＃39; +＆＃39;
- 在选项卡上的awk中拆分字段FS =＆＃39; \ t +＆＃39;
- 在逗号FS =＆＃39;，＆＃39;

过滤已清理的文件：

sed -e 's/ /  /g' file.tmp > file2.tmp    ### replace a space with two spaces.
sed -e 's/ /\t/g' file.tmp > file2.tmp    ### replace a space with tab.
sed -e 's/ /,/g'  file.tmp > file2.tmp    ### replace a space with comma.

插入一个空间（file2.tmp的就地编辑）：

sed -ie 's/_id/& /1' file2.tmp

然后使用awk和新的分隔符：

awk -F '[ ][ ]+' '{print $2}' file2.tmp      ### For runs of two or more spaces
awk -F '[\t]+' '{print $2}' file2.tmp        ### For runs of one or more tabs.
awk -F ',' '{print $2}' file2.tmp            ### For comma.

插入一些其他字符而不是空格（non-breaking space  可能？）。还有其他几个＆＃34;空间＆＃34;在unicode中，搜索维基百科。

sed -ie 's/_id/&\xC2\xA0/1' file2.tmp  ### nbsp is 0xC2 0xA0 in utf-8.
                                       ### change bytes for other encoding.  
awk '{print $2}' file2.tmp             ### Works as the space is a "nbsp".

在字符串之间插入空格

3 个答案: