Question

Pandas as_index中groupby的功能究竟是什么？

Answer 1

当你不了解某件事时，

print()是你的朋友。它多次清除疑惑。

看看：

import pandas as pd

df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})

print(df)

print(df.groupby('books', as_index=True).sum())

print(df.groupby('books', as_index=False).sum())

输出：

  books  price
0   bk1     12
1   bk1     12
2   bk1     12
3   bk2     15
4   bk2     15
5   bk3     17

       price
books       
bk1       36
bk2       30
bk3       17

  books  price
0   bk1     36
1   bk2     30
2   bk3     17

as_index=True groupby()中df.loc['bk1']使用的密钥将成为新数据框中的索引。

将列设置为索引时获得的好处是：

速度。根据索引列过滤值时，例如。 books，因为索引列的散列会更快。它不必遍历整个'bk1'列以查找'bk1'。它只会计算as_index=True的哈希值并在1中找到它。
轻松。 df.loc['bk1'] df.loc[df.books=='bk1']时，您可以使用更短，更快的语法16/12/20 13:03:56 FATAL namenode.NameNode: Failed to start namenode. java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.a ccess0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:5 57) at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyze Storage(Storage.java:490) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSI mage.java:308) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead( FSImage.java:202) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNam esystem.java:1020)，而public String GetAllDocuments(string url,int pager =0) { if (SessionInfo.IsAdmin) { ReportHandler dal = new ReportHandler(); var documents = dal.FetchDocumentsList(SessionInfo.ClientID, pager); string documentsDataJSON = JsonConvert.SerializeObject(documents); return documentsDataJSON; } else { return "Sorry!! You are not authorized to perform this action"; } }则更长，更慢。

Answer 2

当使用group by函数时，as_index可以设置为true或false，具体取决于您希望分组的列是否为输出的索引。

import pandas as pd
table_r = pd.DataFrame({
    'colors': ['orange', 'red', 'orange', 'red'],
    'price': [1000, 2000, 3000, 4000],
    'quantity': [500, 3000, 3000, 4000],
})
new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False)
print new_group

输出：

        price  quantity
colors                 
orange      2         2
red         2         2

现在使用as_index = False

   colors  price  quantity
0  orange      2         2
1     red      2         2

注意当我们更改as_index = False时，颜色不再是索引

什么是pandas中groupby的as_index？

2 个答案: