Question

我有一个由不同的字典组成的pandasSeries 变量。它们是两个特定的键，我想用它们来检索 dataFrame 列中的数据。

当我打印 pandasSeries 时，它看起来像这样：input_ids 和注意掩码是该系列中每个字典的关键

path_to_lge = "flaubert/flaubert_small_cased"
flaubert_tokenizer = FlaubertTokenizer.from_pretrained(path_to_lge, do_lowercase=False)

#texte is <class 'pandas.core.series.Series'>

encoding = texte.apply((lambda x: flaubert_tokenizer.encode_plus(x, add_special_tokens=True, return_token_type_ids=False, truncation=True, padding=True, return_attention_mask=True)))

print(type(encoding)) # <class 'pandas.core.series.Series'>

print(encoding)

0     [input_ids, attention_mask]
1     [input_ids, attention_mask]
2     [input_ids, attention_mask]
3     [input_ids, attention_mask]
4     [input_ids, attention_mask]
5     [input_ids, attention_mask]
6     [input_ids, attention_mask]
7     [input_ids, attention_mask]
8     [input_ids, attention_mask]
9     [input_ids, attention_mask]

如果我想打印一个变量的元素，它看起来像这样：

print(encoding[0])

{'input_ids': [0, 93, 106, 97, 26, 578, 14, 535, 61, 6823, 21, 7652, 19, 151, 11804, 1934, 75, 20340, 75, 10777                                          , 1006, 2986, 15, 200, 75, 50, 1779, 8475, 58, 15, 200, 24023, 14, 50, 75, 25601, 14, 50, 45, 32, 56, 2572, 16,                                           107, 56, 528, 23, 24023, 16, 133, 242, 43, 1291, 14, 63, 535, 61, 20, 6823, 19, 151, 32897, 5589, 659, 386, 87                                          , 3167, 19, 151, 15469, 1271, 23579, 742, 11707, 15, 200, 10500, 9663, 87, 21020, 16, 113, 27, 597, 14, 21, 112                                          0, 16, 175, 20, 4146, 51, 27, 42, 1391, 14, 157, 48, 2158, 19, 245, 16, 33897, 75, 1846, 75, 113, 27, 157, 16,                                           2175, 14, 79, 27, 597, 18, 157, 14, 45, 52, 23, 536, 32, 61, 51, 97, 42, 462, 33, 17, 2038, 16, 1], 'attention_                                          mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1                                          , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1                                          , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1                                          , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

我想以数据帧的形式检索数据帧中输入 ID 和注意力掩码的每个值，如下所示：

所以我期望：

input_ids                      attention_mask
[0, 16, 175, 20, 4146, ...]    [1,1,1,1,1,...]
[51, 27, 42, 1391, 14,...]     [1,1,1,1,1...]
[157, 48, 2158, 19, 245,...]   [1,1,1,1,1,..]

我尝试通过在下面执行此操作来获取数据系列中所有元素的值，但出现错误，我也这样做了，但同样如此，我所有的尝试都失败了，如果有人可以帮助我感谢家人： :

encoding_df = pd.DataFrame(columns=['input_ids', 'attention_mask'])
for e in encoding:
   encoding_df.append(e['input_ids'], e['attention_mask'])

print(encoding_df)

我得到的错误：空数据帧列：[input_ids, attention_mask] 索引：[]

如何访问 <class 'pandas.core.series.Series'> 中不同字典的特定键的所有值？

0 个答案: