Question

我有一个像这样的字符串：

"I\thave\ta\t\tstring"

按顺序拆分，我使用了这个方法：

text = [splits for splits in row.split("\t") if splits is not ""]

现在这个方法会删除字符串中的所有标签但我希望它只删除一个单词后第一次出现的标签，这样它最终会像这样：

"Ihavea\tstring"

有没有办法做到这一点？

Answer 1

在断言后面使用re.split应该这样做：

import re

s = ''.join(re.split(r'(?<!\t)\t', row))
print(s)
# 'Ihavea\tstring'

断言(?<!\t)可防止\t上的分割，其前面是另一个\t。

如果您实际上不需要拆分中的项目，则可以使用re.sub：

s = re.sub(r'(?<!\t)\t', '', row)
print(s)
# 'Ihavea\tstring'

Answer 2

如果您想避免导入re模块，列表理解也是一种方法：

row = "I\thave\ta\t\tstring"
text = [splits if splits else "\t"  for splits in row.split("\t")]
"".join(text)
#'Ihavea\tstring'

空字符串在布尔上下文中为false，并且将为每个连续的split-char生成空列表元素（在这种情况下为“\ t”）

Answer 3

为了简单起见，您可以使用re.split

from re import split
text = "I\thave\ta\t\tstring"
split_string = split(r'\t+', text)  #Gives ['I', 'have', 'a', 'string']

正则表达式r'\t+'基本上只是将所有连续的标签组合在一起。