Question

我有一个如下所示的数据框：

data = {'site' : ['a.com','d.com','d.com','b.com','b.com ',' c.com',c.com','c.com'], 'type' : [ 3, 1,3,1,1,1,3,3]}

sites= pd.DataFrame(data, columns = ['site', 'type'])

某些网站是类型1，有些网站是类型3，但有些网站的类型为1和类型3.我希望在网站有两种类型（如c.com）时将类型更改为其他数字。它可以是2或4或任何其他数字。我想要的输出将是这样的：

Output 我对熊猫很新，我真的被卡住了。我的问题是我不知道如何选择列类型中包含1和3的站点。

我试过了：

sites['site']=np.where[(sites.type == 1)&(sites.type == 3)], 2, sites['type'])

但是我收到了一个错误：

builtin_function_or_method' object is not subscriptable.

我不知道我可以使用的其他功能或如何实现我想要的功能。

提前感谢您的帮助。

Answer 1

您无法使用sites[(sites.type == 1)&(sites.type == 3)]选择同时具有 1 和 3 类型的网站，因为此表达式始终等于False正如 jpp 所指出的那样。更好的方法是：

sites = pd.DataFrame(
 {'site' : ['a.com','d.com','d.com','b.com','b.com ', 'c.com' , 'c.com' , 
                                                             'c.com'],
  'type' : [ 3, 1,3,1,1,1,3,3]}, columns = ['site', 'type'])

temp = sites.groupby("site", as_index = False).nunique() # Count the number of 
                                                      #types for each site
temp.columns = ["site", "nunique_type"] # rename 
new_sites = sites.merge(temp, on = "site") # merge both dataframes
new_sites.loc[new_sites.nunique_type >=2, "type"] = 10 # Give new type to 
                          # sites with more than two different types

Answer 2

这是一个棘手的问题，但我想我明白了......

按网站名称对数据进行分组，然后获取每个网站的nunique()或type列的唯一值数。如果有多个唯一值，您知道该站点在某个时刻在数据框中列出了两种不同的类型。

g = sites.groupby('site')['type'].nunique()

    site
a.com    1
b.com    1
c.com    2
d.com    2
Name: type, dtype: int64

然后我们使用它来选择具有多于1个唯一数量类型值的站点，并选择索引以获取站点名称列表。

dup_sites = g[g >1].index

> Index(['c.com', 'd.com'], dtype='object', name='site')

现在使用此列表，选择sites列的所有行site列isin dup_sites列表，选择type列，替换值，这里我们使用9。

sites.loc[sites.site.isin(dup_sites),'type'] = 9

site    type
0   a.com   3
1   d.com   9
2   d.com   9
3   b.com   1
4   b.com   1
5   c.com   9
6   c.com   9
7   c.com   9

在同一列（pandas）中同时选择满足两个条件的行

2 个答案: