Question

首先，对标题含糊不清表示抱歉，但我不知道如何描述问题。我目前正在处理生物多样性的大型数据集，但有一个特定的问题。

提供一些背景信息：数据集的每一行都是一种生物，而列则是生物分类水平（王国，类别，属，..）和样本（衡量生物在其中存在的数量的标准）。该样本。

因此，一行具有几个分类学级别（=字符串），然后是样本中存在的生物数量，读取的数量（=整数）。

每个样本我想对每个类别（=分类标准）的生物量（整数度量）求和。请注意，每个行/生物的类别都不是唯一的，并且50个有机体（= 50行）可以具有相同的类别，同时具有不同的其他分类标准（例如属）。

classes=pd.unique(pd["class_name"])      ###the different classes (=106), the "class_name" column contains the class of each organism in the df.

x=-1
y=48
class_samples=pd.DataFrame(columns=["class"]) ###empty dataframe to fill with only one column at this point

while x < len(classes)-1:    ###iterate over the classes, start at class 0
    x=x+1
    req_class=pd.loc[pd.class_name == classes[x]] 

###Here (above) all rows of all samples with classes[x] are selected --> so everytime one class is selected and in this req class I want to sum all the reads(=measure) per sample (see while loop below)

    while y < len(pd.columns)-1: ###iterate over the columns = samples, start at column 49 --> for each unique class (=x) I want to iterate over all columns = samples starting at column 49 (=y))

        y=y+1

        class_samples=class_samples.append(pd.DataFrame({"class":[classes[x]],pd.columns[y]:[sum(req_class[df.columns[y]])]})) 
###with this code I would like to fill the class_samples dataframe with each of the classes under the "class"column and I want to create a new column for each of the samples and this column contains the sum of the measure of all organisms with the specified class (=x)

不幸的是，此代码无法正常工作，我已经尝试了很多，但无法正常工作。

因此，在新的数据帧class_samples中，我希望每个类都有一个列，对于每个类，我希望每个样本的度量值之和（=原始df中的样本列）。

我希望我已经足够清楚地解释了这个问题！我在代码中添加了一些注释。

Click here for example picture ###我的声誉太低，无法包含图片

如何在两次迭代中用特定类别填充空数据框？

0 个答案: