Question

我有以下代码：

for line in contentText:
          print type(line), #-> o/p is unicode
          word = line.strip().split()
          print word, #-> o/p is <type 'list'>
          print type(word),

当我执行line.strip().split()时，每个字符都会显示出来。

例如，如果我的行是＆＃34; 从Unicode行读取Word而不是Char ＆＃34;，那么o / p是： [R Ë 一个 d

瓦特 Ø [R d

一。。等等

我希望将其作为＆＃39; Read＆＃39;，＆＃39; word＆＃39;来源于单词，而不是通过char进行进一步处理..

我怎样才能做到这一点？

另外，如何删除空白区域以进行进一步处理？

Answer 1

迭代字符串会产生单字符字符串：

>>> text = 'Read word'
>>> for x in text:
...     print x
... 
R
e
a
d

w
o
r
d

首先拆分以获取单词列表，然后迭代列表：

>>> text.split()  # str.split remove space characters
['Read', 'word']

>>> for x in text.split():
...     print x
... 
Read
word

从Unicode行而不是Char读取Word

1 个答案: