对于以下示例数据框,您将如何:
searched_products
和bought_products
字段date
,page
和products
分组它们从此:
+------------+------+---------------------+-----------------+
| date | page | searched_products | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc | apple, orange | orange |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def | apple, pear, orange | orange, pear |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc | grapes, orange | apple, grapes |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def | apple | apple, oranges |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi | apple, grapes | orange |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl | pear, apple | pear |
+------------+------+---------------------+-----------------+
| etc | etc | etc | etc |
+------------+------+---------------------+-----------------+
对此:
+------------+------+---------+----------+-----------+
| date | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | apple | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | orange | 2 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | grapes | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | pear | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | orange | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def | apple | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def | orange | NaN | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | grapes | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | orange | NaN | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl | pear | 1 | 1 |
+------------+------+---------+----------+-----------+
| etc | etc | etc | etc | etc |
+------------+------+---------+----------+-----------+
答案 0 :(得分:3)
用format
对大熊猫0.25+的解决方案,将重复值拆分为多个值,然后按DataFrame.explode
和最后一个GroupBy.size
汇总计数:
s = (df.assign(searches=df['searched_products'].str.split(', '))
.explode('searches')
.groupby(['date','page','searches'])
.size()
.rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
.explode('purchases')
.groupby(['date','page','purchases'])
.size()
.rename('purchases'))
df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
date page product searches purchases
0 20190101 abc apple 1.0 1.0
1 20190101 abc grapes 1.0 1.0
2 20190101 abc orange 2.0 1.0
3 20190101 def apple 1.0 NaN
4 20190101 def ear 1.0 NaN
5 20190101 def orange 1.0 1.0
6 20190101 def pear NaN 1.0
7 20190102 def apple 1.0 1.0
8 20190102 def oranges NaN 1.0
9 20190102 ghi apple 1.0 NaN
10 20190102 ghi grapes 1.0 NaN
11 20190102 ghi orange NaN 1.0
12 20190102 jkl apple 1.0 NaN
13 20190102 jkl pear 1.0 1.0