Question

我在修改后加入预分割字符串时遇到问题，同时保留了以前的结构。

说我有这样的字符串：

string = """

This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.
"""

我必须对该字符串进行一些测试..在这些单词中找到特定的单词和字符等...然后相应地替换它们。所以要做到这一点，我必须使用

分解它

string.split()

这个问题是，拆分也会消除\ n和额外的空格，立即破坏前一个结构的完整性

是否有一些额外的方法可以让我完成这个或者我应该寻找替代路线？

谢谢。

Answer 1

split方法使用可选参数来指定分隔符。如果您只想使用空格（' '）字符拆分单词，则可以将其作为参数传递：

>>> string = """
...
... This is a nice piece of string isn't it?
... I assume it is so. I have to keep typing
... to use up the space. La-di-da-di-da.
...
... Bonjour.
... """
>>>
>>> string.split()
['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?', 'I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing', 'to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.', 'Bonjour.']
>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nBonjour.\n']
>>>

Answer 2

split方法默认会根据所有空格分割字符串。如果你想分开谎言，你可以先用新行分割你的字符串，然后用空格分割这些行：

>>> [line.split() for line in string.strip().split('\n')]
[['This', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?'], ['I', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing'], ['to', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.'], [], ['Bonjour.']]

Answer 3

只需用分隔符拆分：

>>> string.split(' ')
['\n\nThis', 'is', 'a', 'nice', 'piece', 'of', 'string', "isn't", 'it?\nI', 'assume', 'it', 'is', 'so.', 'I', 'have', 'to', 'keep', 'typing\nto', 'use', 'up', 'the', 'space.', 'La-di-da-di-da.\n\nThis', '', '', 'is', '', '', '', 'a', '', '', '', 'spaced', '', '', 'out', '', '', 'sentence\n\nBonjour.\n']

并将其取回：

>>> ' '.join(a)
This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.

Answer 4

只做string.split(' ')（注意split方法的空格参数）。

这会将你宝贵的新行保留在生成数组的字符串中......

Answer 5

您可以将空格保存在另一个列表中，然后在修改单词列表后将它们连接在一起。

In [1]: from nltk.tokenize import RegexpTokenizer
In [2]: spacestokenizer = RegexpTokenizer(r'\s+', gaps=False)

In [3]: wordtokenizer = RegexpTokenizer(r'\s+', gaps=True)

In [4]: string = """
   ...: 
   ...: This is a nice piece of string isn't it?
   ...: I assume it is so. I have to keep typing
   ...: to use up the space. La-di-da-di-da.
   ...: 
   ...: This   is    a    spaced   out   sentence
   ...: 
   ...: Bonjour.
   ...: """

In [5]: spaces = spacestokenizer.tokenize(string)

In [6]: words = wordtokenizer.tokenize(string)

In [7]: print ''.join([s+w  for s, w in zip(spaces, words)])


This is a nice piece of string isn't it?
I assume it is so. I have to keep typing
to use up the space. La-di-da-di-da.

This   is    a    spaced   out   sentence

Bonjour.

如何在保留以前的结构的同时加入列表？

5 个答案: