我有一个SparseDataFrame
,我想更改一些值。但是,当我使用.loc,.iloc或set_value时,我总是得到:
"SparseArray does not support item assignment via setitem"
TypeError: SparseArray does not support item assignment via setitem
如何使用SparseArray?
https://stackoverflow.com/a/49030495/9157212建议使用df.to_dense()
,执行分配,然后使用df.to_sparse()
。有没有办法直接在SparseDataFrame
/ SparseArray
?
答案 0 :(得分:3)
使用.loc []无法以稀疏格式直接插入令人沮丧。恐怕我只有解决方法。
自从最初发布问题(和0.25版本)以来,熊猫已弃用SparseDataFrame。相反,它创建了一种数据类型(SparseDtype),可以将其应用于DataFrame中的各个序列。换句话说,它不再是“全有或全无”。您可以:
与将整个DataFrame转换为密集型相比,这显然要少很多内存。
这是一个非常简单的函数来说明我的意思:
def sp_loc(df, index, columns, val):
""" Insert data in a DataFrame with SparseDtype format
Only applicable for pandas version > 0.25
Args
----
df : DataFrame with series formatted with pd.SparseDtype
index: str, or list, or slice object
Same as one would use as first argument of .loc[]
columns: str, list, or slice
Same one would normally use as second argument of .loc[]
val: insert values
Returns
-------
df: DataFrame
Modified DataFrame
"""
# Save the original sparse format for reuse later
spdtypes = df.dtypes[columns]
# Convert concerned Series to dense format
df[columns] = df[columns].sparse.to_dense()
# Do a normal insertion with .loc[]
df.loc[index, columns] = val
# Back to the original sparse format
df[columns] = df[columns].astype(spdtypes)
return df
简单用法示例:
# DÉFINITION DATAFRAME SPARSE
df1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])
df1.loc['a', 'J'] = 0.42
df1 = df1.astype(pd.SparseDtype(float))
# | I | J
# ----+-----+--------
# a | nan | 0.42
# b | nan | nan
# c | nan | nan
df1.dtypes
#I Sparse[float64, nan]
#J Sparse[float64, nan]
df1.sparse.density
# 0.16666666666666666
# INSERTION
df1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])
# | I | J
# ----+-----+--------
# a | -1 | 0.42
# b | 1 | nan
# c | nan | nan
df1.sparse.density()
# 0.5