Question

遇到了一个复杂的问题，过去几天我一直无法解决。

给出以下DF，

我想结束：

本质上，我们在“ user_entry_note”列中查看是否已按顺序成功输入给定序列的某些块。

要从序列中获取块，我使用以下函数：

def get_chunks_from_seq(seq_id):

    tidy = tidy_string(seq_id)
    'work out for all the possible chunks'

    # work out the chunks
    ord_chunks = [tidy[i:j] for i, j in itertools.combinations(range(len(tidy)+1), 2)]

    return(ord_chunks)

按顺序返回所有可能的块的列表。

现在，我在不使用大量数据框的情况下难以实现各种目标。我认为我可能会从流程的早期就遗忘了一个窍门。

此处“ seq”是原始序列，“块”是该序列的组成部分。整个序列在第二阶段也变成了一个块。

对于每个“大块”，我想知道它的“完成” trial_ms（按“ user_entry_note”列中的顺序播放）以及此时用户entry_error_no和userentries_plybs中的值。

我设法做到这一点：

# get a list of the possible chunks based on the sequence
    chunks = get_chunks_from_seq(df1['seq'][0])

    # create df of chunks and their completion indexes
    h = [find_idx(seq, df1, 'user_entry_note') for seq in chunks]

    # list of the chunks themselves
    h2 = [seq for seq in chunks]

    # column of chunk lens
    h3 = [len(seq) if isinstance(seq, list) is True else 1 for seq in chunks]

    # create strings of these
    h2_str = []
    for p in h2:
        if type(p) == list:
            p = list_to_string(p)
            h2_str.append(p)
        else: 
            h2_str.append(str(p))

    # make df to format them
    df1_2 = pd.DataFrame({'chunk_idx__completion_in_trial': h,'chunk': h2_str,'chunk_len': h3 })


    # sub df
    subdf1 = ['user_id','timecode','user_entries_error_no', 'user_entries_plybs']
    df1_3 = df1.iloc[h,:][subdf1].reset_index()

    #tie everything together
    keep = ['chunk','user_id','timecode','user_entries_error_no','user_entries_plybs']
    df2 = df1_2.join(df1_3)[keep]

但是我认为我需要放弃这种方法来实现我的第二个目标，这就是我感到困惑的地方。

除此之外，我想知道何时传递了块中的每个音符（trial_ms）何时传递了该块（但不知道这些音符可能何时出现过）。

换句话说，在下面的示例中：

对于块“ 40-30”，n1将是7 n2将是索引8，因为该块已在8中完成。在索引2中出现40无关紧要。但是，索引2将是正确的索引在这种情况下（对于n = 1的所有块），块“ 40”的n1也等于“ chunk_completed”列。

可复制的DF：

    f = {'seq': {0: '60-40-30',
  1: '60-40-30',
  2: '60-40-30',
  3: '60-40-30',
  4: '60-40-30',
  5: '60-40-30',
  6: '60-40-30',
  7: '60-40-30',
  8: '60-40-30'},
 'seq_len': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3},
 'seq_list': {0: [60, 40, 30],
  1: [60, 40, 30],
  2: [60, 40, 30],
  3: [60, 40, 30],
  4: [60, 40, 30],
  5: [60, 40, 30],
  6: [60, 40, 30],
  7: [60, 40, 30],
  8: [60, 40, 30]},

 'trial_ms': {0: -9223372037,
  1: -18963961,
  2: 31992270,
  3: -13028311,
  4: -18963961,
  5: 31992270,
  6: -13028311,
  7: -18963961,
  8: 31992270},
 'user_entries_error_no': {0: 1,
  1: 2,
  2: 6,
  3: 2,
  4: 3,
  5: 3,
  6: 3,
  7: 2,
  8: 4},
 'user_entries_plybs': {0: 2, 1: 3, 2: 3, 3: 2, 4: 3, 5: 3, 6: 1, 7: 1, 8: 4},
 'user_entry_note': {0: 23,
  1: 60,
  2: 40,
  3: 30,
  4: 40,
  5: 3,
  6: 3,
  7: 2,
  8: 4},
 'user_id': {0: 'seb',
  1: 'seb',
  2: 'seb',
  3: 'seb',
  4: 'seb',
  5: 'seb',
  6: 'seb',
  7: 'seb',
  8: 'seb'}}



 df1 = pd.DataFrame().from_dict(f)

根据序列块从DF列中提取信息

0 个答案: