遇到了一个复杂的问题,过去几天我一直无法解决。
给出以下DF,
我想结束:
本质上,我们在“ user_entry_note”列中查看是否已按顺序成功输入给定序列的某些块。
要从序列中获取块,我使用以下函数:
def get_chunks_from_seq(seq_id):
tidy = tidy_string(seq_id)
'work out for all the possible chunks'
# work out the chunks
ord_chunks = [tidy[i:j] for i, j in itertools.combinations(range(len(tidy)+1), 2)]
return(ord_chunks)
按顺序返回所有可能的块的列表。
现在,我在不使用大量数据框的情况下难以实现各种目标。我认为我可能会从流程的早期就遗忘了一个窍门。
此处“ seq”是原始序列,“块”是该序列的组成部分。整个序列在第二阶段也变成了一个块。
对于每个“大块”,我想知道它的“完成” trial_ms(按“ user_entry_note”列中的顺序播放)以及此时用户entry_error_no和userentries_plybs中的值。
我设法做到这一点:
# get a list of the possible chunks based on the sequence
chunks = get_chunks_from_seq(df1['seq'][0])
# create df of chunks and their completion indexes
h = [find_idx(seq, df1, 'user_entry_note') for seq in chunks]
# list of the chunks themselves
h2 = [seq for seq in chunks]
# column of chunk lens
h3 = [len(seq) if isinstance(seq, list) is True else 1 for seq in chunks]
# create strings of these
h2_str = []
for p in h2:
if type(p) == list:
p = list_to_string(p)
h2_str.append(p)
else:
h2_str.append(str(p))
# make df to format them
df1_2 = pd.DataFrame({'chunk_idx__completion_in_trial': h,'chunk': h2_str,'chunk_len': h3 })
# sub df
subdf1 = ['user_id','timecode','user_entries_error_no', 'user_entries_plybs']
df1_3 = df1.iloc[h,:][subdf1].reset_index()
#tie everything together
keep = ['chunk','user_id','timecode','user_entries_error_no','user_entries_plybs']
df2 = df1_2.join(df1_3)[keep]
但是我认为我需要放弃这种方法来实现我的第二个目标,这就是我感到困惑的地方。
除此之外,我想知道何时传递了块中的每个音符(trial_ms)何时传递了该块(但不知道这些音符可能何时出现过)。
换句话说,在下面的示例中:
对于块“ 40-30”,n1将是7 n2将是索引8,因为该块已在8中完成。在索引2中出现40无关紧要。但是,索引2将是正确的索引在这种情况下(对于n = 1的所有块),块“ 40”的n1也等于“ chunk_completed”列。
可复制的DF:
f = {'seq': {0: '60-40-30',
1: '60-40-30',
2: '60-40-30',
3: '60-40-30',
4: '60-40-30',
5: '60-40-30',
6: '60-40-30',
7: '60-40-30',
8: '60-40-30'},
'seq_len': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3},
'seq_list': {0: [60, 40, 30],
1: [60, 40, 30],
2: [60, 40, 30],
3: [60, 40, 30],
4: [60, 40, 30],
5: [60, 40, 30],
6: [60, 40, 30],
7: [60, 40, 30],
8: [60, 40, 30]},
'trial_ms': {0: -9223372037,
1: -18963961,
2: 31992270,
3: -13028311,
4: -18963961,
5: 31992270,
6: -13028311,
7: -18963961,
8: 31992270},
'user_entries_error_no': {0: 1,
1: 2,
2: 6,
3: 2,
4: 3,
5: 3,
6: 3,
7: 2,
8: 4},
'user_entries_plybs': {0: 2, 1: 3, 2: 3, 3: 2, 4: 3, 5: 3, 6: 1, 7: 1, 8: 4},
'user_entry_note': {0: 23,
1: 60,
2: 40,
3: 30,
4: 40,
5: 3,
6: 3,
7: 2,
8: 4},
'user_id': {0: 'seb',
1: 'seb',
2: 'seb',
3: 'seb',
4: 'seb',
5: 'seb',
6: 'seb',
7: 'seb',
8: 'seb'}}
df1 = pd.DataFrame().from_dict(f)