从数据框中删除列以仅显示所需的列

时间:2019-01-18 15:58:28

标签: python pandas

基本上,我想删除一些我不需要的列。而且我很困惑为什么这不起作用

import os
import pandas


def summarise(indir, outfile):
os.chdir(indir)
filelist = ".txt"
dflist = []
colnames = ["DSP Code", "Report Date", "Initial Date", "End Date", "Transaction Type", "Sale Type",
            "Distribution Channel", "Products Origin ID", "Product ID", "Artist", "Title", "Units Sold",
            "Retail Price", "Dealer Price", "Additional Revenue", "Warner Share", "Entity to be billed",
            "E retailer name", "E retailer Country", "End Consumer Country", "Price Code", "Currency Code"]
for filename in filelist:
    print(filename)
    df = pandas.read_csv('SYB_M_20171001_20171031.txt', header=None, encoding='utf-8', sep='\t', names=colnames,
                         skiprows=3)
    df['data_revenue'] = df['Units Sold'] * df['Dealer Price']  # Multiplying Units with Dealer price = Revenue
    df = df.sort_values(['End Consumer Country', 'Currency Code'])  # Sorts the columns alphabetically
    df.to_csv(outfile + r"\output.csv", index=None)
    dflist.append(filename)
    df.drop(columns='DSP Code')


summarise(r"O:\James Upson\Sound Track Your Brand Testing\SYB Test",
      r"O:\James Upson\Sound Track Your Brand Testing\SYB Test Formatted") 

我想删除colnames中除'Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code'以外的所有列标题。我尝试使用df.drop(columns='DSP Code')删除一列,但这似乎不起作用。

任何帮助将不胜感激:)

4 个答案:

答案 0 :(得分:2)

您可以这样做:

df.drop(['Col_1', 'col_2'], axis=1, inplace=True)

OR:

df = df.drop(columns=colnames)

如注释部分所建议,使用usecols提供一种过滤器,以减少列部分以仅在需要处理其余列的情况下使用,因此将提高效率,并减少资源消耗更少:

df = pandas.read_csv('SYB_M_20171001_20171031.txt', encoding='utf-8', sep='\t', usecols=["col1", "col2", "col3"],skiprows=3)

答案 1 :(得分:1)

const char* x = ""

该位不起作用,因为您没有将其分配给新的df

df.drop(columns='DSP Code')

您还可以通过将所需的列复制到另一个数据框中来保留它们。

答案 2 :(得分:0)

根据pandas.DataFrame.drop,除非您就地执行该操作,否则它将返回一个数据帧。

  

返回:
  删除:pandas.DataFrame

     

inplace:bool,默认为False

     

如果为True,请就地执行操作并返回None。

要么就地执行:df.drop(columns=['DSP Code'], inplace=True),要么存储返回的数据帧:df=df.drop(columns=['DSP Code'])

答案 3 :(得分:0)

只需:

df = df['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']

您保留想要的内容,而不是丢弃其他内容。