Question

我在Python中使用以下代码将字符串拆分为单词：

keywords=re.sub(r'[][)(!,;]', ' ', str(row[0])).split()

想象输入是：

"Hello #world I am in #London and it is #sunny today"

我需要在第二个主题标签出现之前将其分为单词，不需要拆分其余的，这意味着输出应为：

['Hello','#world','I','am','in']

有没有解决办法在Python中以这种方式将字符串拆分成关键字？

Answer 1

split方法接受要拆分的字符，否则会在空格上拆分。

string_to_split = "Hello #world I am in #London and it is #sunny today"
# Split on all occurrences of #
temp = string_to_split.split("#")
# Join the first two entries with a '#' and remove any trailing whitespace
temp_two = '#'.join(temp[:2]).strip()
# split on spaces
final = temp_two.split(' ')

在终端中运行：

>>> string_to_split = "Hello #world I am in #London and it is #sunny today"
>>> temp = string_to_split.split("#")
>>> temp_two = '#'.join(temp[:2]).strip()
>>> final = temp_two.split(' ')
>>> final
['Hello', '#world', 'I', 'am', 'in']

编辑：修复[2：]到[：2]我总是让他们混淆

编辑：修复额外的空白问题

Answer 2

str.find采取开始位置，所以当你发现第一次使用index + 1 t开始寻找第二个时，然后拆分那个子串：

s = "Hello #world I am in #London and it is #sunny today"
i =  s.find("#", s.find("#") + 1)
print(s[:i].split())
['Hello', '#world', 'I', 'am', 'in']

您也可以使用索引执行相同操作：

s = "Hello #world I am in #London and it is #sunny today"
i =  s.index("#", s.index("#") + 1)
print(s[:i].split())

如果子字符串不存在，则索引的差异将引发错误。

Answer 3

interactive python：

>>> str="Hello #world I am in #London and it is #sunny today"
>>> hash_indices=[i for i, element in enumerate(str) if element=='#']
>>> hash_indices
[6, 21, 39]
>>> str[0:hash_indices[1]].split()
['Hello', '#world', 'I', 'am', 'in']
>>> str[hash_indices[1]:]
'#London and it is #sunny today'
>>>

Answer 4

正则表达式和拆分

source = "Hello #world I am in #London and it is #sunny today"
reg_out = re.search('[^#]*#[^#]*#', source)
split_out = reg_out.group().split()
print split_out[:-1]

O / P：['你好'，'＃world'，'我'，'我'，'在']

在python中第n次出现主题标记之前用字分割字符串

4 个答案: