如何删除按熊猫分组的列

时间:2019-05-01 20:31:42

标签: python pandas csv data-analysis

尝试删除不再需要的数据列。

我尝试使用.drop,但它没有做任何我能告诉的事情。

df=df.groupby(df['Distributor'])['Tickets Sold'].sum()
df1=df[df.div(df.sum()).lt(0.01)]
df2=df.drop(df1.index)
yourdf=pd.concat([df2,pd.Series(df1.sum(),index=['Others'])])

yourdf = yourdf.sort_values(ascending=False)
print(yourdf)
yourdf2 = yourdf.drop(columns=['Tickets Sold'])
print(yourdf2)

代替这个。

20th Century Fox      141367982
Focus Features         18799261
Lionsgate              75834308
Paramount Pictures     86302817
STX Entertainment      22606674
Sony Pictures         102746480
Universal             159556790
Walt Disney           315655340
Warner Bros.          216426845
Others                 74618013

然后进入这个。

Walt Disney           315655340
Warner Bros.          216426845
Universal             159556790
20th Century Fox      141367982
Sony Pictures         102746480
Paramount Pictures     86302817
Lionsgate              75834308
Others                 74618013
STX Entertainment      22606674
Focus Features         18799261

我需要这个。

Walt Disney          
Warner Bros.         
Universal             
20th Century Fox      
Sony Pictures         
Paramount Pictures     
Lionsgate              
Others                 
STX Entertainment      
Focus Features

2 个答案:

答案 0 :(得分:0)

尝试指定axis=1告诉您要删除列而不是索引。

yourdf.drop('Tickets Sold', axis=1, inplace=True)
print(yourdf)
#           Distributor
# 0    20th Century Fox
# 1      Focus Features
# 2           Lionsgate
# 3  Paramount Pictures
# 4   STX Entertainment
# 5       Sony Pictures
# 6           Universal
# 7         Walt Disney
# 8         Warner Bros
# 9              Others

如果您真的想保留yourdf并拥有另一个yourdf2,那么

yourdf2 = yourdf.drop('Tickets Sold', axis=1)

答案 1 :(得分:0)

查看您的注释并阅读您的代码,我认为幕后问题是您过多地投射/重铸了变量。这会导致在需要时覆盖/丢失您要查找的内容。不用担心,我确定您处于项目的第一阶段并正在测试。但是我想指出这一点,以防万一。您始终可以使用inplace=True关键字参数来解决此问题。

在任何情况下,正如Brian Cohan所说,您都需要使用axis=1来删除轴。

接受代码,看起来像这样。

df = pd.DataFrame(df.groupby(df['Distributor'])['Tickets Sold'].sum()); display(df)
df = df.sort_values(by="Tickets Sold", ascending=False); display(df)
df = df.drop("Tickets Sold", axis = 1); display(df)
# See here ------------------^
|--------------------+--------------|
|                    | Tickets Sold |
|--------------------+--------------|
| Distributor        |              |
|--------------------+--------------|
| 20th Century Fox   |    141367982 |
| Focus Features     |     18799261 |
| Lionsgate          |     75834308 |
| Paramount Pictures |     86302817 |
| STX Entertainment  |     22606674 |
| Sony Pictures      |    102746480 |
| Universal          |    159556790 |
| Walt Disney        |    315655340 |
| Warner Bros.       |    216426845 |
| Others             |     74618013 |
|--------------------+--------------|

|--------------------+--------------|
|                    | Tickets Sold |
|--------------------+--------------|
| Distributor        |              |
|--------------------+--------------|
| Walt Disney        |    315655340 |
| Warner Bros.       |    216426845 |
| Universal          |    159556790 |
| 20th Century Fox   |    141367982 |
| Sony Pictures      |    102746480 |
| Paramount Pictures |     86302817 |
| Lionsgate          |     75834308 |
| Others             |     74618013 |
| STX Entertainment  |     22606674 |
| Focus Features     |     18799261 |
|--------------------+--------------|

|--------------------+
|                    |
|--------------------+
| Distributor        |
|--------------------+
| Walt Disney        |
| Warner Bros.       |
| Universal          |
| 20th Century Fox   |
| Sony Pictures      |
| Paramount Pictures |
| Lionsgate          |
| Others             |
| STX Entertainment  |
| Focus Features     |
|--------------------|