如何拆分此字符串?

时间:2017-04-25 16:42:40

标签: python sed

我正试图在有空白包围的整数2位数的地方之前分割字符串。最终我希望这可以在Python中工作,但我一直在使用sed工作,我无法理解。

我的测试数据如下所示:

13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73

我希望它像这样分开(注意逗号的位置','):

13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

上面的数据是相当规律的,因为两位数的整数在[13,19]范围内,但我应该期望的范围是[10,99]。

有人可以建议一种方法来执行上述转换吗?我已经使用正则表达式了一段时间,但我不能涵盖所有情况。

4 个答案:

答案 0 :(得分:7)

预见断言(?=...)可以解决这个问题:

>>> a = """13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73"""

>>> print(re.sub("(\d{2}) (?=\d{2}( |$))","\g<1>, ", a))
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

所以,reg exp。你需要的是(\d{2}) (?=\d{2}( |$)),这意味着:

  1. (\d{2}) =&gt;在组1中存储2个数字并匹配额外的空格。
  2. (?=\d{2}( |$)) =&gt;匹配2个数字和1个空格或EOL,但不要消耗它们。
  3. 这里的关键是,通过不消耗第二个匹配的组,下次应用子功能时将再次处理它。最后,\g<1>,将使用相同的数字和附加的,替换1.

答案 1 :(得分:0)

为了sed的乐趣,因为你似乎对sed参考感兴趣以便理解。

sed ":a;s/\([^,]\)\(\s[0-9]\{2\}\s\)/\1,\2/;ta"

sed -E ":a;s/([^,])(\s[0-9]{2}\s)/\1,\2/;ta"
  • 开始循环
    • 寻找
      • ,以外的其他内容,对于稍后循环非常重要
      • 一个空格,两个数字和一个空格
    • 替换为非逗号,逗号和其他
  • 循环,如果它取代了某些东西

输出(完全符合要求的输出):

13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

答案 2 :(得分:0)

添加到VMRuiz's answer,这会输出每行的列表,而不是一个大字符串。我必须更改正则表达式才能使用re.split而不是re.sub,而且我不确定它是否相同。

for line in a.split('\n'):
    re.split('(?<=\d{2}) (?=\d{2} |$)', line)

编辑:这绝对是一样的,但有点尴尬:

for line in re.sub('(\d{2}) (?=\d{2}( |$))', '\g<1>,', a).split('\n'):
    line.split(',')

答案 3 :(得分:0)

如果你想要一个非正则表达式的Python解决方案,你可以这样做:

s = """\
13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73"""


res=""
for line in s.splitlines():
    buf=line.split()
    for i, e in enumerate(buf[1:], 1):
        buf[i-1]+=", " if e.isdigit() else " "
    res+=''.join(buf)+"\n"  

>>> res
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

awk中你可以这样做:

awk '{n=split($0,a)
      for (i=2;i<=n;i++)
          printf "%s%s", a[i-1], a[i]~/^[[:digit:]]+$/ ?  ", " : " "
      print a[n]
    }' file
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73