我的df矩阵看起来像这样:
rating
id 10153337 10183250 10220967 ... 99808270 99816554 99821259
user_id ...
10003869 NaN 8.0 NaN ... NaN NaN NaN
10022889 NaN NaN 3.0 ... NaN 1.0 NaN
我无法获得我需要的专栏,因为它会返回超出范围的索引'错误
specificID = ratings_matrix[[99816554]]
...
raise IndexError("indices are out-of-bounds")
IndexError: indices are out-of-bounds
为什么不搜索为列提供的值?
一些可运行的代码:
ratings = pd.read_json(
''.join(
['{"columns":["id","rating","user_id"],"index":[0,1,2],"data":[[',
'67728134,4,10003869],[57495823,9,10060085],[99816554,1,10022889]]}']
), orient='split')
ratings
ratings.dtypes
ratings_matrix = ratings.pivot_table(index=['user_id'], columns=['id'], values=['rating'])
ratings_matrix.columns.map(type)
ratings_matrix[[67728134]] #here! searches column numbers rather than values
答案 0 :(得分:4)
请注意,在创建数据透视表时,您将列表传递给values
参数:
ratings_matrix = ratings.pivot_table( # |<--- here --->|
index=['user_id'], columns=['id'], values=['rating'])
这告诉大熊猫要创建一个pd.MultiIndex
。这就是为什么您的结果中包含rating
列的级别的原因。
选项1
使用multiindex
specificID = ratings_matrix[[('rating', 99816554)]]
选项2
不要创建多索引
ratings_matrix = ratings.pivot_table( # see what I did?
index=['user_id'], columns=['id'], values='rating')
然后
specificID = ratings_matrix[[99816554]]
设置
df = pd.read_json(
''.join(
['{"columns":["id","rating","user_id"],"index":[0,1,2],"data":[[',
'67728134,4,10003869],[57495823,9,10060085],[99816554,1,10022889]]}']
), orient='split'
)
df
ratings_matrix = ratings.pivot_table( # |<--- here --->|
index=['user_id'], columns=['id'], values=['rating'])
ratings_matrix[[('rating', 67728134)]]
ratings_matrix = ratings.pivot_table( # see what I did?
index=['user_id'], columns=['id'], values='rating')
ratings_matrix[[67728134]]