我正在尝试根据第三列匹配的一列做交叉表。以示例数据为例:
df = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
'id_match' : ['101', '101', '201', '201', '26', '26'],
'time' : ['10', '10', '16', '16', '1', '1']})
其中id_match匹配,我想为人口统计列的交叉表查找时间的总和。输出如下:
A B C
A 0 52 0
B 52 0 0
C 0 0 2
希望这很有道理,如果没有,请发表评论。谢谢J
答案 0 :(得分:1)
您可以使用 Filter filter = mCustomerFilterDao.fetchcustomerSettings(settings);
// Want to add observe for filter table so that we have latest customer
//settings. This is my Query.
List<Book> booklist = new ArrayList<>();
for(Book book:bookrecords)
{
if(book.status == filter.status)
{booklist.add(book);}
}
和merge
解决此问题:
crosstab
如果您需要用零填充的NaN,则可以使用u = df.reset_index()
v = u.merge(u, on='id_match').query('index_x != index_y')
r = pd.crosstab(v.demographic_x,
v.demographic_y,
v.time_x.astype(int) + v.time_y.astype(int),
aggfunc='sum')
print(r)
demographic_y A B C
demographic_x
A NaN 52.0 NaN
B 52.0 NaN NaN
C NaN NaN 4.0
:
fillna