Question

我有一个熊猫数据框df

import numpy as np
import pandas as pd

df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :["0", "[7 8 9]", "[10]", "0", "[2]", "0", "0", "0", "0"]})

# convert the string representations of list structures to actual lists
F_ID_as_series_of_lists = df["F_ID"].str.replace("[","").str.replace("]","").str.split(" ")

#type(F_ID_as_series_of_lists) is pd.Series, make it a list for pd.DataFrame.from_records
F_ID_as_records = list(F_ID_as_series_of_lists)

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

该行出现错误：

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

错误是：TypeError: object of type 'float' has no len()

我该如何解决？

Answer 1

问题显然是一些None或NaN的值，但是如果对新的expand=True使用带有参数DataFrame的{{3}}，它将正确处理。

也可以使用str.split来代替replace：

df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :[None, "[7 8 9]", "[10]", np.nan, "[2]", "0", "0", "0", "0"]})

print (df)
   ID type     F_ID
0   2    A     None
1   3    B  [7 8 9]
2   4    B     [10]
3   5    A      NaN
4   6    A      [2]
5   7    B        0
6   8    A        0
7   9    A        0
8  10    A        0

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True)
print (f_id_df)
      0     1     2
0  None  None  None
1     7     8     9
2    10  None  None
3   NaN   NaN   NaN
4     2  None  None
5     0  None  None
6     0  None  None
7     0  None  None
8     0  None  None

最后一次需要将值转换为数字：

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True).astype(float)
print (f_id_df)
      0    1    2
0   NaN  NaN  NaN
1   7.0  8.0  9.0
2  10.0  NaN  NaN
3   NaN  NaN  NaN
4   2.0  NaN  NaN
5   0.0  NaN  NaN
6   0.0  NaN  NaN
7   0.0  NaN  NaN
8   0.0  NaN  NaN

Answer 2

还有另一种使用列表推导和利用我们从类型错误本身中学到的知识的方法。

假设您有一个pandas系列，它是一个字符串数据类型，并且您想在给定'/'符号的情况下将该列分为两部分，但不是所有列都会被填充。

pd.DataFrame({'TEXT_COLUMN' : ['12/4', '54/19', np.NaN, '89/33']})

..我们想将该列分为两个不同的列，但是我们知道当将熊猫放回DataFrame时，大熊猫会把它弄乱，所以我们将其放在一个列表中：

split_list = list(df.TEXT_COLUMN.str.split('/'))

split_list返回，我们可以看到为什么在尝试解析时会出现float错误：

>> [['12','4'],['54','19'], np.NaN, ['89','33']]

现在我们有了该列表，然后我们希望将其放置在可纠正空值问题的理解中。我们可以通过在理解范围内创建条件类型来做到这一点：

better_split_list = [x if type(x) != np.float else [None,None] for x in split_list]

better_split_list返回：

>> [['12','4'],['54','19'], [None,None], ['89','33']]

这使我们处于一个合适的位置，然后将列表列表放入其自己的pandas DataFrame中，并以更可靠的方式将列分开：

pd.DataFrame(better_split_list, columns = ['VALUE_1','VALUE_2'])

收到错误：TypeError：“ float”类型的对象在熊猫中没有len（）

2 个答案: