Question

我有一个数据集导入为DataFrame＆＃34; new_data_words＆＃34;。有一列＆＃34; page_name＆＃34;包含凌乱的网页名称，例如＆＃34; %D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%A2%D9%84%D9...＆＃34;，＆＃34; %D9%85%D9%84%D9%81:IT-Airforce-OR2.png＆＃34;或者只是＆＃34; 1950＆＃34;。我想创建一个新列＆＃39; word_count＆＃39;在页面名称中包含单词的计数（单词由＆＃39; _＆＃39;分隔）

以下是我的代码：

要拆分为单词：

b = list(new_data_words['page_name'].str.split('_'))
new_data_words['words'] = b

我检查了b的类型是列表类型，len（b）是 6035980 。一个样本值：

In [1]: new_data_words.loc[0,'words']
Out[2]: ['%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%A2%D9%84%D9%87%D8%A9',
         '%D8%A8%D9%84%D8%A7%D8%AF',
         '%D8%A7%D9%84%D8%B1%D8%A7%D9%81%D8%AF%D9%8A%D9%86']

我创建了另一个专栏＆＃34; word_count＆＃34;计算列的每一行中列表的元素＆＃34;单词＆＃34;。（必须使用循环来触摸每行中列表的元素）

但我有错误：

x = []
i = []
c = 0
for i in b:    # i is list type, with elements are string, I checked
    c=c+1
    x.append(len(i))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c0cf0cfbc458> in <module>()
      6         #y = str(y)
      7     c=c+1
----> 8     x.append(len(i))

TypeError: object of type 'float' has no len()

我不知道为什么它是漂浮型......

但是，如果我只添加一个印刷品，它就可以了

x = []
i = []
c = 0
for i in b:
    c=c+1
    print len(i)
    x.append(len(i))

3
2
3
2
3
1
8
...

但是c = len（x）= 68516，远小于6百万。

我试图强制元素再次成为字符串，发生了另一个错误：

x = []
for i in b:
    for y in i:
        y = str(y)
    x.append(len(i))


TypeError                                 Traceback (most recent call last)
<ipython-input-164-c86f5f48b80c> in <module>()
      1 x = []
      2 for i in b:
----> 3     for y in i:
      4         y = str(y)
      5     x.append(len(i))
TypeError: 'float' object is not iterable

我认为我是列表类型并且可以迭代...

同样，如果我没有追加，但只打印，那就有效：

x = []
for i in b:
    for y in i:
        y = str(y)
    print (len(i))

另一个例子：这有效：

a = []
for i in range(10000):
    a.append(len(new_data_words.loc[i,"words"]))

更改为动态范围，它不起作用：

a = []
for i in range(len(b)):
    a.append(len(new_data_words.loc[i,"words"]))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-f9d0af3c448f> in <module>()
      1 a = []
      2 for i in range(len(b)):
----> 3     a.append(len(new_data_words.loc[i,"words"]))

TypeError: object of type 'float' has no len()

这不起作用......

a = []
for i in range(6035980):
    a.append(len(new_data_words.loc[i,"words"]))

似乎列表中有一些异常。但我不知道那是什么或如何找到它。

任何人都可以帮忙吗？

Answer 1

你错了。您看到的错误100％清楚地表明b是一个包含至少一个float的迭代（无论其他元素是否为str我是否会推测）。< / p>

尝试做：

for i in b:
    print(type(i), i)

你会发现至少有一个float。或者这只是打印b的不可迭代组件：

import collections

for i in b:
    if not isinstance(i, collections.Iterable):
        print(type(i), i)

TypeError：类型为＆＃39; float＆＃39;的对象没有len（）＆amp; TypeError：＆＃39; float＆＃39;对象不可迭代

1 个答案: