在邀请者和被邀请者的列中,如何查找具有1,2,3个先前邀请的邀请者?

时间:2019-09-27 00:53:44

标签: python mysql

注意:我可以使用MySQL或python

编辑:根据用户的建议,我的问题是MRE:草莓,我创建表的方式是这样的(创建,删除表并不是必须的,所以我只使用了所有相同的日期):

CREATE table invites (
  ID                INT AUTO_INCREMENT,
  invitee_id        INT,
  inviter_id        INT,
  inviter_user_code VARCHAR(20),
  created_at        datetime,
  updated_at        datetime,
  PRIMARY KEY (ID)
); 
INSERT INTO invites (invitee_id, inviter_id, inviter_user_code, created_at,updated_at)
VALUES 
  (17365, 17374, 'BDMX5Z', '2019-02-01', '2019-02-01'),
  (17401, 17349, 'BDMX58', '2019-02-01', '2019-02-01'),
  (17403, 17349, 'BDMX58', '2019-02-01', '2019-02-01'),
  (17452, 17349, 'BDMX8C', '2019-02-01', '2019-02-01'),
  (17457, 17449, 'BDMX8J', '2019-02-01', '2019-02-01');

为了使自己清楚,这里是我的数据框的样子:

    id invitee_id   inviter_id  inviter_user_code   created_at           updated_at
    1   17375       17374             BDMX5Z    2019-02-01 10:28:44 2019-02-01 10:28:44
    2   17401       17349             BDMX58    2019-02-01 11:59:47 2019-02-01 11:59:47
    3   17403       17349             BDMX58    2019-02-01 12:03:22 2019-02-01 12:03:22
    4   17452       17449             BDMX8C    2019-02-01 13:39:31 2019-02-01 13:39:31
    5   17457       17455             BDMX8J    2019-02-01 14:00:25 2019-02-01 14:00:25
    6   17502       17501             BDMX9Y    2019-02-01 15:50:44 2019-02-01 15:50:44
    7   17541       17540             BDMXB7    2019-02-01 17:15:06 2019-02-01 17:15:06
    8   17542       17546             BDMXBD    2019-02-01 17:34:48 2019-02-01 17:34:48
    9   17696       17630             BDMXDZ    2019-02-02 11:46:14 2019-02-02 11:46:14
    10  17706       13191             BDMT3A    2019-02-02 12:23:47 2019-02-02 12:23:47

invitee_id是受邀的用户。
inviter_id是已邀请新用户的用户。

因此,如果您是第一个没有事先邀请的邀请,inviter_id将不会包含在invitee_id中。

为此我做了

select 
  *
from user_invitations
where
  inviter_id in
    (select invitee_id
     from user_invitations)

此后,我将以inviter_id为邀请者,并且事先得到邀请。

我的问题是如何使邀请者获得事先邀请(谁也有事先邀请)等等...
我已经在mysql中直接尝试了多种方法,并通过创建df并使用了它。

对结果表进行上述查询,例如:

With one_prior as (
    select 
      *
    from user_invitations
    where
      inviter_id in
        (select invitee_id
          from user_invitations)
) 
select *
from one_prior
where 
  inviter_id in
   (select invitee_id 
    from one_prior);

我已经手动检查了一个用户,但是似乎有办法检查所有用户吗?

我创建了两个查询:

select *
from user_invitations
where inviter_id = 17349;


select *
from user_invitations
where invitee_id = 23764;

,然后来回检查。 例如,如果没有首次邀请就首先邀请了visitr_id = 17349,它将不会出现在第二个查询中。然后从带有viterr_id = 17349的第一个查询的结果中,我获得了invitate_id = 17401、17403等。然后,我现在将它们作为邀请者_id放入第一个查询中。重复这些步骤。

还有一种方法可以创建分布式点图,其中每个点代表用户,并且有一条线连接具有“邀请者/被邀请者”关系的链接用户?

编辑: 说我正在做的第五个链接,代码看起来又长又乏味,我希望找到更有效的方法。

query = """
With five_prior as
(
    With four_prior as
    (
        With three_prior as 
        (
            With two_prior as 
            (
                With one_prior as 
                (
                    select 
                      *
                    from user_invitations
                    where inviter_id in
                          (select invitee_id
                          from user_invitations)
                ) 
            select *
            from one_prior
            where inviter_id in
                  (select invitee_id 
                   from one_prior)
            ) 
        select *
        from two_prior
        where inviter_id in
              (select invitee_id 
               from two_prior)
        )
    select *
    from three_prior
    where inviter_id in
          (select invitee_id
           from three_prior)
    )
select *
from four_prior
where inviter_id in
      (select invitee_id
       from four_prior)
)

select *
from five_prior
where inviter_id in
      (select invitee_id
       from five_prior)
group by inviter_id
"""
df = pd.read_sql(query, con=conn)

five_link = list(df.inviter_id)
print(len(five_link))

2 个答案:

答案 0 :(得分:1)

这是使用动态编程解决方案在python中完成的简单方法:

previous_invites = { r["invitee"]: 0 for r in rows }

changed = True

while changed:
  changed = False
  for r in rows:
    update_prev_invites = max(previous_invites[r["invitee"]], previous_invites.get(r["inviter"], 0) + 1)
    if update_prev_invites > previous_invites[r["invitee"]]:
      changed = True
      previous_invites[r["invitee"]] = update_prev_invites

for r in rows:
  print "User " + str(r["id"]) + " had a chain of " + str(previous_invites[r["invitee"]]) + " inviter(s) behind them"

这假设rows是包含数据库中数据的字典数组。通过将被邀请者previous_invites的值设置为其邀请者的previous_invites + 1,构建previous_invites字典(将被邀请者映射到其“链”中的邀请者数量)。直到字典收敛到正确答案为止。

n个用户和m是最长邀请者链的长度的情况下,此解决方案在O(n)空间和O(n * m)时间中运行。

答案 1 :(得分:1)

IIUC,您可以使用Networkx库,

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

df = pd.read_clipboard(sep='\s\s+')

G = nx.from_pandas_edgelist(df, 'inviter_id', 'invitee_id', create_using=nx.DiGraph())

fig, ax = plt.subplots(figsize=(10,8))
nx.draw_networkx(G)

[(i,list(G.successors(i))) for i in G.nodes() if len(list(G.predecessors(i))) == 0]

[f'Inviter {str(i)} invites {", ".join(map(str, list(G.successors(i))))}' for i in G.nodes() if len(list(G.predecessors(i))) == 0]

输出:

['Inviter 17374 invites 17375',
 'Inviter 17349 invites 17401, 17403',
 'Inviter 17449 invites 17452',
 'Inviter 17455 invites 17457',
 'Inviter 17501 invites 17502',
 'Inviter 17540 invites 17541',
 'Inviter 17546 invites 17542',
 'Inviter 17630 invites 17696',
 'Inviter 13191 invites 17706']

图形网络图像:

enter image description here