我必须处理一个文本块,在某些字段的中间可能会有一些虚假的换行符。我想删除这些换行符(用空格替换它们),而不删除“有效”换行符,前面总是以\t
开头。
所以,我想替换所有没有带空格的制表符的换行符。为了使事情变得复杂一点,如果换行的两边都有空格,那么我想保留它。换句话说,这个
"one\ttwo\tbuckle my \nshoe\t\t\n"
会变成
"one\ttwo\tbuckle my shoe\t\t\n"
,即'my'和'shoe'之间有一个空格,而不是两个。
编辑 - 一些澄清:不需要的换行符位于一段常规文本的中间。如果出现换行的单词之间有空格,我想保留它。另外,我想添加一个。例如
"one\ttwo\tbuckle my \nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
"one\ttwo\tbuckle my\nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
"one\ttwo\tbuckle my \n shoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
编辑2:我提出了一个笨拙但有效的解决方案。我对它不是很满意,双gsubbing似乎不太优雅。
>> strings = ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
=> ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
>> strings.collect{|s| s.gsub(/[^\t]\n\s?/){|match| match.gsub(/\s*\n\s*/," ")} }
=> ["one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n"]
考虑到我现在对添加/保留空格的扩展要求,这似乎比下面的任何建议更好。
答案 0 :(得分:2)
你可以匹配:
(\G|[^\t])\n
用反向引用取代第1组匹配。
这是一个Ruby代码段(as seen on ideone.com):
from = "\none\ttwo\tbuckle my \nshoe\t\t\nx\n\n\t\n\n"
to = "one\ttwo\tbuckle my shoe\t\t\nx\t\n"
mod = from.gsub(/(\G|[^\t])\n/, '\1')
puts (mod == to) # true
基本上我们要么匹配不是\t
的“某事”,而是匹配\n
,而只替换“某事”部分(有效地保留“它”),但删除{ {1}}),或者我们可以使用\n
继续上一场比赛,以便在字符串的开头允许\G
,或者在另一个已删除的\n
之后。{/ p>
如果味道支持lookbehind,您也可以匹配:
\n
简单地用空字符串替换。
答案 1 :(得分:1)
使用双阴性([^\S\t]
表示除TAB字符外的所有空格)
def fix(str)
return str.gsub(/([^\t]|^)[^\S\t]+/, '\1 ')
end
以下测试
#! /usr/bin/ruby
require "test/unit"
require "test/unit/ui/console/testrunner"
class MyTestCases < Test::Unit::TestCase
def test_after_space
assert_equal fix("one\ttwo\tbuckle my \nshoe\t\t\n"),
"one\ttwo\tbuckle my shoe\t\t\n"
end
def test_no_whitespace_neighbors
assert_equal fix("one\ttwo\tbuckle my\nshoe\t\t\n"),
"one\ttwo\tbuckle my shoe\t\t\n"
end
def test_whitespace_surrounded
assert_equal fix("one\ttwo\tbuckle my \n shoe\t\t\n"),
"one\ttwo\tbuckle my shoe\t\t\n"
end
def test_leading_newline
assert_equal fix("\none\ttwo"),
" one\ttwo"
end
end
Test::Unit::UI::Console::TestRunner.run(MyTestCases)
全部通过:
Loaded suite MyTestCases Started .... Finished in 0.000412 seconds. 4 tests, 4 assertions, 0 failures, 0 errors
答案 2 :(得分:0)
str = str.gsub(/\s*(?<!\t)\n\s*/, " ")