在python中第n次出现主题标记之前用字分割字符串

时间:2016-03-03 10:37:29

标签: python regex string

我在Python中使用以下代码将字符串拆分为单词:

keywords=re.sub(r'[][)(!,;]', ' ', str(row[0])).split()

想象输入是:

"Hello #world I am in #London and it is #sunny today"

我需要在第二个主题标签出现之前将其分为单词不需要拆分其余的,这意味着输出应为:

['Hello','#world','I','am','in'] 

有没有解决办法在Python中以这种方式将字符串拆分成关键字?

4 个答案:

答案 0 :(得分:3)

split方法接受要拆分的字符,否则会在空格上拆分。

string_to_split = "Hello #world I am in #London and it is #sunny today"
# Split on all occurrences of #
temp = string_to_split.split("#")
# Join the first two entries with a '#' and remove any trailing whitespace
temp_two = '#'.join(temp[:2]).strip()
# split on spaces
final = temp_two.split(' ')

在终端中运行:

>>> string_to_split = "Hello #world I am in #London and it is #sunny today"
>>> temp = string_to_split.split("#")
>>> temp_two = '#'.join(temp[:2]).strip()
>>> final = temp_two.split(' ')
>>> final
['Hello', '#world', 'I', 'am', 'in']

编辑:修复[2:]到[:2]我总是让他们混淆

编辑:修复额外的空白问题

答案 1 :(得分:3)

str.find采取开始位置,所以当你发现第一次使用index + 1 t开始寻找第二个时,然后拆分那个子串:

s = "Hello #world I am in #London and it is #sunny today"
i =  s.find("#", s.find("#") + 1)
print(s[:i].split())
['Hello', '#world', 'I', 'am', 'in']

您也可以使用索引执行相同操作:

s = "Hello #world I am in #London and it is #sunny today"
i =  s.index("#", s.index("#") + 1)
print(s[:i].split())

如果子字符串不存在,则索引的差异将引发错误。

答案 2 :(得分:2)

interactive python:

>>> str="Hello #world I am in #London and it is #sunny today"
>>> hash_indices=[i for i, element in enumerate(str) if element=='#']
>>> hash_indices
[6, 21, 39]
>>> str[0:hash_indices[1]].split()
['Hello', '#world', 'I', 'am', 'in']
>>> str[hash_indices[1]:]
'#London and it is #sunny today'
>>> 

答案 3 :(得分:1)

正则表达式和拆分

source = "Hello #world I am in #London and it is #sunny today"
reg_out = re.search('[^#]*#[^#]*#', source)
split_out = reg_out.group().split()
print split_out[:-1]

O / P:['你好','#world','我','我','在']