Question

我想我有一个数据框：

question  user   level 
    1      a       1     
    1      b       2     
    1      a       3     
    2      a       1     
    2      b       2     
    2      a       3     
    2      b       4     
    3      c       1     
    3      b       2     
    3      c       3     
    3      a       4     
    3      b       5

列级指定谁启动了主题以及谁回复了该主题。如果用户的等级为1，则表示他提出了问题。如果用户的级别为2，则表示他回复了提出问题的用户。如果用户的级别为3，则表示他回复了级别为2的用户，依此类推。

我想提取一个新的数据框，该数据框应该通过问题在用户之间进行通信。它应包含三列：＆＃34;用户来源＆＃34;，＆＃34;用户目的地＆＃34;和＃34;回复计数＆＃34;。回复计数是用户目的地＆＃34;直接＆＃34;回复用户来源。

    us_source us_dest reply_count
        a        b       2
        a        c       0
        b        a       0
        b        c       0
        c        a       0
        c        b       1

我尝试使用此代码找到前两列。

idx_cols = ['question']
std_cols = ['user_x', 'user_y']
df1 = df.merge(df, on=idx_cols)
df2 = df1.loc[f1.user_x != f1.user_y, idx_cols + std_cols]

df2.loc[:, std_cols] = np.sort(df2.loc[:, std_cols])

有人对第三栏有什么建议吗？考虑＆＃34;直接＆＃34;当且仅当B在级别k回复同一主题中级别为k-1的A的消息时，从B到A的答复。如果学生A开始主题（通过在1级发送消息），B回答A（在2级发送消息），所以B直接回答A.只有学生＆＃39;从第2级到第1级的回复。

Answer 1

我的建议：

我会使用包含'source-destination'作为键的字典，并使用reply_counts作为值。

循环遍历第一个数据帧，对于每个问题，将第一条消息发布为目的地的商店，将第二条消息发布为源的商店，在关键字“source-destination”的字典中添加计数器。例如（不熟悉熊猫，我会让你格式化它）：

from itertools import permutations
reply_counts = {}  # the dictionary where results is going to be stored
users = set()
destination = False  # a simple boolean to make sure message 2 follows message 1

for row in dataframe:  # iterate over the dataframe
    users.add(row[1])  # collect users' name
    if row[2] == 1:  # if it is an initial message
        destination = row[1]  # we store users as destination
    elif row[2] == 2 and destination:  # if this is a second message 
        source = row[1]  # store user as source
        key = source + "-" + destination  # construct a key based on source/destination
        if key not in reply_counts:  # if the key is new to dictionary
            reply_counts[key] = 1  # create the new entry
        else:  # otherwise
            reply_counts[key] += 1  # add a counter to the existing entry
        destination = False  # reset destination

    else:
        destination = False  # reset destination

# add the pairs of source-destination who didn't interact in the dictionnary
for pair in permutations(users, 2):
    if "-".join(pair) not in reply_counts:
        reply_counts["-".join(pair)] = 0

然后您可以将字典转换回数据帧。

数据框中的Python列和行交互

1 个答案: