Question

我有一个数据框df，其列名称为单词或日期。我想只保留名称中的单词或某些特定日期的列。特别是，数据框有列＆＃34; Price＆＃34; ，＆＃34; Origin＆＃34;，＆＃34; Company＆＃34;，以及1980年到2008年，我想只保留1996年到2008年的那些，以及带有单词作为列的那些名。

这是我写的代码，但我很确定有更短更优雅的方法：

<preference name="DisallowOverscroll" value="true" />

有任何帮助吗？谢谢！

Answer 1

一种替代方法是使用列表理解并动态创建新列表：

df.columns
Index(['Price', 'Origin', 'Company', '1980', '1981', '1982', '1983', '1984',
       '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993',
       '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002',
       '2003', '2004', '2005', '2006', '2007', '2008'],
      dtype='object')

columns_to_keep = [col for col in df.columns if not (col.isdigit() and int(col) <= 1995)]

['Price',
 'Origin',
 'Company',
 '1996',
 '1997',
 '1998',
 '1999',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008']

在isdigit绕过任何“col是整数”错误之前将类型更改为字符串：

columns_to_keep = [col for col in df.columns if not (str(col).isdigit() and int(col) <= 1995)]

Answer 2

您可以使用DataFrame.filter。该方法还支持正则表达式，因此您可以匹配任何您喜欢的模式。

import pandas as pd
import re
df = pd.DataFrame({'Price':[1,2,3], 'Origin':['Earth','Mars','Pluto'], 'Company':['SpaceX', 'SpaceY', 'SpaceZ'], '1980':[100,101,102], '2000':[50,51,52]})

years = ['%d'%x for x in range(1996,2009)]
pattern = r'(?:[a-zA-Z]+|'+'|'.join(y)+')'
newdf = df.filter(regex=pattern)

保留名称为整数且满足特定条件的数据框的列

2 个答案: