注意:我可以使用MySQL或python
编辑:根据用户的建议,我的问题是MRE:草莓,我创建表的方式是这样的(创建,删除表并不是必须的,所以我只使用了所有相同的日期):
CREATE table invites (
ID INT AUTO_INCREMENT,
invitee_id INT,
inviter_id INT,
inviter_user_code VARCHAR(20),
created_at datetime,
updated_at datetime,
PRIMARY KEY (ID)
);
INSERT INTO invites (invitee_id, inviter_id, inviter_user_code, created_at,updated_at)
VALUES
(17365, 17374, 'BDMX5Z', '2019-02-01', '2019-02-01'),
(17401, 17349, 'BDMX58', '2019-02-01', '2019-02-01'),
(17403, 17349, 'BDMX58', '2019-02-01', '2019-02-01'),
(17452, 17349, 'BDMX8C', '2019-02-01', '2019-02-01'),
(17457, 17449, 'BDMX8J', '2019-02-01', '2019-02-01');
为了使自己清楚,这里是我的数据框的样子:
id invitee_id inviter_id inviter_user_code created_at updated_at
1 17375 17374 BDMX5Z 2019-02-01 10:28:44 2019-02-01 10:28:44
2 17401 17349 BDMX58 2019-02-01 11:59:47 2019-02-01 11:59:47
3 17403 17349 BDMX58 2019-02-01 12:03:22 2019-02-01 12:03:22
4 17452 17449 BDMX8C 2019-02-01 13:39:31 2019-02-01 13:39:31
5 17457 17455 BDMX8J 2019-02-01 14:00:25 2019-02-01 14:00:25
6 17502 17501 BDMX9Y 2019-02-01 15:50:44 2019-02-01 15:50:44
7 17541 17540 BDMXB7 2019-02-01 17:15:06 2019-02-01 17:15:06
8 17542 17546 BDMXBD 2019-02-01 17:34:48 2019-02-01 17:34:48
9 17696 17630 BDMXDZ 2019-02-02 11:46:14 2019-02-02 11:46:14
10 17706 13191 BDMT3A 2019-02-02 12:23:47 2019-02-02 12:23:47
invitee_id
是受邀的用户。
inviter_id
是已邀请新用户的用户。
因此,如果您是第一个没有事先邀请的邀请,inviter_id
将不会包含在invitee_id
中。
为此我做了
select
*
from user_invitations
where
inviter_id in
(select invitee_id
from user_invitations)
此后,我将以inviter_id
为邀请者,并且事先得到邀请。
我的问题是如何使邀请者获得事先邀请(谁也有事先邀请)等等...
我已经在mysql中直接尝试了多种方法,并通过创建df并使用了它。
对结果表进行上述查询,例如:
With one_prior as (
select
*
from user_invitations
where
inviter_id in
(select invitee_id
from user_invitations)
)
select *
from one_prior
where
inviter_id in
(select invitee_id
from one_prior);
我已经手动检查了一个用户,但是似乎有办法检查所有用户吗?
我创建了两个查询:
select *
from user_invitations
where inviter_id = 17349;
select *
from user_invitations
where invitee_id = 23764;
,然后来回检查。 例如,如果没有首次邀请就首先邀请了visitr_id = 17349,它将不会出现在第二个查询中。然后从带有viterr_id = 17349的第一个查询的结果中,我获得了invitate_id = 17401、17403等。然后,我现在将它们作为邀请者_id放入第一个查询中。重复这些步骤。
还有一种方法可以创建分布式点图,其中每个点代表用户,并且有一条线连接具有“邀请者/被邀请者”关系的链接用户?
编辑: 说我正在做的第五个链接,代码看起来又长又乏味,我希望找到更有效的方法。
query = """
With five_prior as
(
With four_prior as
(
With three_prior as
(
With two_prior as
(
With one_prior as
(
select
*
from user_invitations
where inviter_id in
(select invitee_id
from user_invitations)
)
select *
from one_prior
where inviter_id in
(select invitee_id
from one_prior)
)
select *
from two_prior
where inviter_id in
(select invitee_id
from two_prior)
)
select *
from three_prior
where inviter_id in
(select invitee_id
from three_prior)
)
select *
from four_prior
where inviter_id in
(select invitee_id
from four_prior)
)
select *
from five_prior
where inviter_id in
(select invitee_id
from five_prior)
group by inviter_id
"""
df = pd.read_sql(query, con=conn)
five_link = list(df.inviter_id)
print(len(five_link))
答案 0 :(得分:1)
这是使用动态编程解决方案在python中完成的简单方法:
previous_invites = { r["invitee"]: 0 for r in rows }
changed = True
while changed:
changed = False
for r in rows:
update_prev_invites = max(previous_invites[r["invitee"]], previous_invites.get(r["inviter"], 0) + 1)
if update_prev_invites > previous_invites[r["invitee"]]:
changed = True
previous_invites[r["invitee"]] = update_prev_invites
for r in rows:
print "User " + str(r["id"]) + " had a chain of " + str(previous_invites[r["invitee"]]) + " inviter(s) behind them"
这假设rows
是包含数据库中数据的字典数组。通过将被邀请者previous_invites
的值设置为其邀请者的previous_invites
+ 1,构建previous_invites
字典(将被邀请者映射到其“链”中的邀请者数量)。直到字典收敛到正确答案为止。
在n
个用户和m
是最长邀请者链的长度的情况下,此解决方案在O(n)空间和O(n * m)时间中运行。
答案 1 :(得分:1)
IIUC,您可以使用Networkx库,
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_clipboard(sep='\s\s+')
G = nx.from_pandas_edgelist(df, 'inviter_id', 'invitee_id', create_using=nx.DiGraph())
fig, ax = plt.subplots(figsize=(10,8))
nx.draw_networkx(G)
[(i,list(G.successors(i))) for i in G.nodes() if len(list(G.predecessors(i))) == 0]
[f'Inviter {str(i)} invites {", ".join(map(str, list(G.successors(i))))}' for i in G.nodes() if len(list(G.predecessors(i))) == 0]
输出:
['Inviter 17374 invites 17375',
'Inviter 17349 invites 17401, 17403',
'Inviter 17449 invites 17452',
'Inviter 17455 invites 17457',
'Inviter 17501 invites 17502',
'Inviter 17540 invites 17541',
'Inviter 17546 invites 17542',
'Inviter 17630 invites 17696',
'Inviter 13191 invites 17706']
图形网络图像: