Question

您能在下面提出建议吗，我有些困惑。

因此，dataframe3具有列“域”和“大小”。我的脚本清理了域并添加了一个名为“ newdomain2”的新列

我在下面添加列并查看数据框，它看起来正确。

因此，df4必须是df3的聚合版本（按域+ sum（size）分组，但是当我尝试以下操作时，会出现此错误：

TypeError：不可散列的类型：“列表”

我应该注意，如果我在同一脚本中使用“ domain”而不是“ cleandomain2”，那么它将正常工作。

您能帮助我理解为什么会这样吗？

 df3['newdomain2']=cleandomain
 #show df3
 df3

 df4 = df3.groupby(['newdomain2'])[['size']].sum()

这是我用来生成新列值并将这些值添加到数据框的脚本

for x in index:
     #if it ends with a number, it's an IP
     if str(x[len(x)-1]).isnumeric():
         cleandomain.append(str(x[0])+'.'+str(x[1])+'.*.*')
     #if its in the CDN list, take a subdomain as well
     elif str(x[len(x)-2]).rstrip() in cdns:
         cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
     elif str(x[len(x)-3]).rstrip() in cdns:
         cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its in the TLD list, do this
     elif str(x[len(x)-2]).rstrip()+'.'+ str(x[len(x)-1]).rstrip() in tld:
         cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
     elif str(x[len(x)-1]) in tld:
         cleandomain.append(str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its not in the TLD list, do this
     else:
         cleandomain.append(x)
 #add column do df3
 df3['newdomain2']=cleandomain

Answer 1

您不能直接将列表放入数据框列中，

df3['your_col'] = pd.Series(your_list).values

在熊猫数据框上按分组进行期间的错误

1 个答案: