正则表达式问题 - 替换所有前面没有带空格的制表符的换行符

时间:2010-08-09 11:21:30

标签: ruby regex

我必须处理一个文本块,在某些字段的中间可能会有一些虚假的换行符。我想删除这些换行符(用空格替换它们),而不删除“有效”换行符,前面总是以\t开头。

所以,我想替换所有没有带空格的制表符的换行符。为了使事情变得复杂一点,如果换行的两边都有空格,那么我想保留它。换句话说,这个

"one\ttwo\tbuckle my \nshoe\t\t\n"

会变成

"one\ttwo\tbuckle my shoe\t\t\n"

,即'my'和'shoe'之间有一个空格,而不是两个。

编辑 - 一些澄清:不需要的换行符位于一段常规文本的中间。如果出现换行的单词之间有空格,我想保留它。另外,我想添加一个。例如

"one\ttwo\tbuckle my \nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"

"one\ttwo\tbuckle my\nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"

"one\ttwo\tbuckle my \n shoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"

编辑2:我提出了一个笨拙但有效的解决方案。我对它不是很满意,双gsubbing似乎不太优雅。

>> strings = ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
=> ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
>> strings.collect{|s| s.gsub(/[^\t]\n\s?/){|match| match.gsub(/\s*\n\s*/," ")} }
=> ["one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n"]

考虑到我现在对添加/保留空格的扩展要求,这似乎比下面的任何建议更好。

3 个答案:

答案 0 :(得分:2)

没有后瞻性选项

你可以匹配:

(\G|[^\t])\n

用反向引用取代第1组匹配。

这是一个Ruby代码段(as seen on ideone.com):

from = "\none\ttwo\tbuckle my \nshoe\t\t\nx\n\n\t\n\n"
to   = "one\ttwo\tbuckle my shoe\t\t\nx\t\n"

mod  = from.gsub(/(\G|[^\t])\n/, '\1')

puts (mod == to) # true

基本上我们要么匹配不是\t的“某事”,而是匹配\n,而只替换“某事”部分(有效地保留“它”),但删除{ {1}}),或者我们可以使用\n继续上一场比赛,以便在字符串的开头允许\G,或者在另一个已删除的\n之后。{/ p>

参考


Lookbehind选项

如果味道支持lookbehind,您也可以匹配:

\n

简单地用空字符串替换。

参考

答案 1 :(得分:1)

使用双阴性([^\S\t]表示除TAB字符外的所有空格)

def fix(str)
  return str.gsub(/([^\t]|^)[^\S\t]+/, '\1 ')
end

以下测试

#! /usr/bin/ruby

require "test/unit"
require "test/unit/ui/console/testrunner"

class MyTestCases < Test::Unit::TestCase
  def test_after_space
    assert_equal fix("one\ttwo\tbuckle my \nshoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_no_whitespace_neighbors
    assert_equal fix("one\ttwo\tbuckle my\nshoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_whitespace_surrounded
    assert_equal fix("one\ttwo\tbuckle my \n shoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_leading_newline
    assert_equal fix("\none\ttwo"),
                     " one\ttwo"
  end
end

Test::Unit::UI::Console::TestRunner.run(MyTestCases)

全部通过:

Loaded suite MyTestCases
Started
....
Finished in 0.000412 seconds.

4 tests, 4 assertions, 0 failures, 0 errors

答案 2 :(得分:0)

str = str.gsub(/\s*(?<!\t)\n\s*/, " ")