我有一个熊猫数据框df
import numpy as np
import pandas as pd
df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
"type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
"F_ID" :["0", "[7 8 9]", "[10]", "0", "[2]", "0", "0", "0", "0"]})
# convert the string representations of list structures to actual lists
F_ID_as_series_of_lists = df["F_ID"].str.replace("[","").str.replace("]","").str.split(" ")
#type(F_ID_as_series_of_lists) is pd.Series, make it a list for pd.DataFrame.from_records
F_ID_as_records = list(F_ID_as_series_of_lists)
f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)
该行出现错误:
f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)
错误是:TypeError: object of type 'float' has no len()
我该如何解决?
答案 0 :(得分:0)
问题显然是一些None
或NaN
的值,但是如果对新的expand=True
使用带有参数DataFrame
的{{3}},它将正确处理。
也可以使用str.split
来代替replace
:
df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
"type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
"F_ID" :[None, "[7 8 9]", "[10]", np.nan, "[2]", "0", "0", "0", "0"]})
print (df)
ID type F_ID
0 2 A None
1 3 B [7 8 9]
2 4 B [10]
3 5 A NaN
4 6 A [2]
5 7 B 0
6 8 A 0
7 9 A 0
8 10 A 0
f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True)
print (f_id_df)
0 1 2
0 None None None
1 7 8 9
2 10 None None
3 NaN NaN NaN
4 2 None None
5 0 None None
6 0 None None
7 0 None None
8 0 None None
最后一次需要将值转换为数字:
f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True).astype(float)
print (f_id_df)
0 1 2
0 NaN NaN NaN
1 7.0 8.0 9.0
2 10.0 NaN NaN
3 NaN NaN NaN
4 2.0 NaN NaN
5 0.0 NaN NaN
6 0.0 NaN NaN
7 0.0 NaN NaN
8 0.0 NaN NaN
答案 1 :(得分:0)
还有另一种使用列表推导和利用我们从类型错误本身中学到的知识的方法。
假设您有一个pandas系列,它是一个字符串数据类型,并且您想在给定'/'符号的情况下将该列分为两部分,但不是所有列都会被填充。
pd.DataFrame({'TEXT_COLUMN' : ['12/4', '54/19', np.NaN, '89/33']})
..我们想将该列分为两个不同的列,但是我们知道当将熊猫放回DataFrame时,大熊猫会把它弄乱,所以我们将其放在一个列表中:
split_list = list(df.TEXT_COLUMN.str.split('/'))
split_list
返回,我们可以看到为什么在尝试解析时会出现float错误:
>> [['12','4'],['54','19'], np.NaN, ['89','33']]
现在我们有了该列表,然后我们希望将其放置在可纠正空值问题的理解中。我们可以通过在理解范围内创建条件类型来做到这一点:
better_split_list = [x if type(x) != np.float else [None,None] for x in split_list]
better_split_list
返回:
>> [['12','4'],['54','19'], [None,None], ['89','33']]
这使我们处于一个合适的位置,然后将列表列表放入其自己的pandas DataFrame中,并以更可靠的方式将列分开:
pd.DataFrame(better_split_list, columns = ['VALUE_1','VALUE_2'])