我在Python中使用以下代码将字符串拆分为单词:
keywords=re.sub(r'[][)(!,;]', ' ', str(row[0])).split()
想象输入是:
"Hello #world I am in #London and it is #sunny today"
我需要在第二个主题标签出现之前将其分为单词,不需要拆分其余的,这意味着输出应为:
['Hello','#world','I','am','in']
有没有解决办法在Python中以这种方式将字符串拆分成关键字?
答案 0 :(得分:3)
split方法接受要拆分的字符,否则会在空格上拆分。
string_to_split = "Hello #world I am in #London and it is #sunny today"
# Split on all occurrences of #
temp = string_to_split.split("#")
# Join the first two entries with a '#' and remove any trailing whitespace
temp_two = '#'.join(temp[:2]).strip()
# split on spaces
final = temp_two.split(' ')
在终端中运行:
>>> string_to_split = "Hello #world I am in #London and it is #sunny today"
>>> temp = string_to_split.split("#")
>>> temp_two = '#'.join(temp[:2]).strip()
>>> final = temp_two.split(' ')
>>> final
['Hello', '#world', 'I', 'am', 'in']
编辑:修复[2:]到[:2]我总是让他们混淆
编辑:修复额外的空白问题
答案 1 :(得分:3)
str.find
采取开始位置,所以当你发现第一次使用index + 1 t开始寻找第二个时,然后拆分那个子串:
s = "Hello #world I am in #London and it is #sunny today"
i = s.find("#", s.find("#") + 1)
print(s[:i].split())
['Hello', '#world', 'I', 'am', 'in']
您也可以使用索引执行相同操作:
s = "Hello #world I am in #London and it is #sunny today"
i = s.index("#", s.index("#") + 1)
print(s[:i].split())
如果子字符串不存在,则索引的差异将引发错误。
答案 2 :(得分:2)
interactive python:
>>> str="Hello #world I am in #London and it is #sunny today"
>>> hash_indices=[i for i, element in enumerate(str) if element=='#']
>>> hash_indices
[6, 21, 39]
>>> str[0:hash_indices[1]].split()
['Hello', '#world', 'I', 'am', 'in']
>>> str[hash_indices[1]:]
'#London and it is #sunny today'
>>>
答案 3 :(得分:1)
正则表达式和拆分
source = "Hello #world I am in #London and it is #sunny today"
reg_out = re.search('[^#]*#[^#]*#', source)
split_out = reg_out.group().split()
print split_out[:-1]
O / P:['你好','#world','我','我','在']