我有一个数据框:
train_review = train['review']
train_review
它看起来像:
0 With all this stuff going down at the moment w...
1 \The Classic War of the Worlds\" by Timothy Hi...
2 The film starts with a manager (Nicholas Bell)...
3 It must be assumed that those who praised this...
4 Superbly trashy and wondrously unpretentious 8...
我将令牌添加到字符串中:
train_review = train['review']
train_token = ''
for i in train['review']:
train_token +=i
我想要使用Spacy将评论标记化。 这是我尝试的方法,但是出现以下错误:
参数'string'具有错误的类型(预期的str,得到了 spacy.tokens.doc.Doc)
我该如何解决?预先感谢!
答案 0 :(得分:2)
在您的def window_ndim(a, wfunction):
for axis, axis_size in enumerate(a.shape):
window = wfunction(axis_size)
for i in range(len(a.shape)):
if i == axis:
continue
else:
window = np.stack([window] * a.shape[i], axis=i)
a *= window
return a
循环中,您将从数据帧中获取spacy.token,并将其附加到字符串中,因此应将其强制转换为for
。
像这样:
str