我在修改后加入预分割字符串时遇到问题,同时保留了以前的结构。
说我有这样的字符串:
string = """
This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.
This is a spaced out sentence
Bonjour.
"""
我必须对该字符串进行一些测试..在这些单词中找到特定的单词和字符等...然后相应地替换它们。所以要做到这一点,我必须使用
分解它string.split()
这个问题是,拆分也会消除\ n和额外的空格,立即破坏前一个结构的完整性
是否有一些额外的方法可以让我完成这个或者我应该寻找替代路线?
谢谢。
答案 0 :(得分:1)
split
方法使用可选参数来指定分隔符。如果您只想使用空格(' '
)字符拆分单词,则可以将其作为参数传递:
>>> string = """
...
... This is a nice piece of string isn't it?
... I assume it is so. I have to keep typing
... to use up the space. La-di-da-di-da.
...
... Bonjour.
... """
>>>
>>> string.split()
['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?', 'I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing', 'to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.', 'Bonjour.']
>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nBonjour.\n']
>>>
答案 1 :(得分:1)
split方法默认会根据所有空格分割字符串。如果你想分开谎言,你可以先用新行分割你的字符串,然后用空格分割这些行:
>>> [line.split() for line in string.strip().split('\n')]
[['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?'], ['I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing'], ['to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.'], [], ['Bonjour.']]
答案 2 :(得分:1)
只需用分隔符拆分:
>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nThis', '', '', 'is', '', '', '', 'a', '', '', '', 'spaced', '', '', 'out', '', '', 'sentence\n\nBonjour.\n']
并将其取回:
>>> ' '.join(a)
This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.
This is a spaced out sentence
Bonjour.
答案 3 :(得分:0)
只做string.split(' ')
(注意split方法的空格参数)。
这会将你宝贵的新行保留在生成数组的字符串中......
答案 4 :(得分:0)
您可以将空格保存在另一个列表中,然后在修改单词列表后将它们连接在一起。
In [1]: from nltk.tokenize import RegexpTokenizer
In [2]: spacestokenizer = RegexpTokenizer(r'\s+', gaps=False)
In [3]: wordtokenizer = RegexpTokenizer(r'\s+', gaps=True)
In [4]: string = """
...:
...: This is a nice piece of string isn't it?
...: I assume it is so. I have to keep typing
...: to use up the space. La-di-da-di-da.
...:
...: This is a spaced out sentence
...:
...: Bonjour.
...: """
In [5]: spaces = spacestokenizer.tokenize(string)
In [6]: words = wordtokenizer.tokenize(string)
In [7]: print ''.join([s+w for s, w in zip(spaces, words)])
This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.
This is a spaced out sentence
Bonjour.