Question

我正在尝试创建一个熊猫DataFrame“ B”，考虑到逐行读取另一个DataFrame“ A”。

问题是我想填充新DataFrame“ B”的单元格值，计算第一个数据集“ A”中出现的特定情况。

我无法用全零初始化DataFrame“ B”，因为我不知道它将有多少行。

如果我不初始化DataFrame“ B”的单元格值，则会收到此错误

if(foo ==1){
    something1();
}
elseif(foo ==2){
    something2();
}
elseif(foo ==3){
    something3();
}
else something4();

当我尝试：

KeyError: "the label ['0'] is not in the [index]"

Answer 1

您可以使用df.loc [_not_yet_existing_index_label_] = new_row

当DataFrame中不存在标签时，使用df.at将引发KeyError。

或者因为添加新行会占用大量内存，所以我强烈建议this solution可以解决您的问题，尽管不是您要求的那样。（在完成对B的迭代之后，创建一个字典并实例化一个DataFrame A。）

正如@philipzhou提到的，您实际上可以使用collections.counter，它会自动打开类似这样的内容：

print(collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']))
Output:
Counter({'b': 3, 'a': 2, 'c': 1})

您可以想象这里的每个字母都是一个行索引，添加到集合中并在通过A时进行计数。然后跟随these instructions将其变成df_B。基本上：

df_B = pd.DataFrame.from_dict(a_counter, orient='index').reset_index()

其中a_counter是您的collection.counter对象。