我有一个数据框,我希望将其拆分为两个数据框,一个包含以foo
开头的所有列,另一个包含其余列。
有这么快的方法吗?
答案 0 :(得分:2)
您可以使用list comprehensions
选择所有列名称:
df = pd.DataFrame({'fooA':[1,2,3],
'fooB':[4,5,6],
'fooC':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
D E F fooA fooB fooC
0 1 5 7 1 4 7
1 3 3 4 2 5 8
2 5 6 3 3 6 9
foo = [col for col in df.columns if col.startswith('foo')]
print (foo)
['fooA', 'fooB', 'fooC']
other = [col for col in df.columns if not col.startswith('foo')]
print (other)
['D', 'E', 'F']
print (df[foo])
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df[other])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
filter
和difference
的另一种解决方案:
df1 = df.filter(regex='^foo')
print (df1)
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df.columns.difference(df1.columns))
Index(['D', 'E', 'F'], dtype='object')
print (df[df.columns.difference(df1.columns)])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
<强>计时强>:
In [123]: %timeit a(df)
1000 loops, best of 3: 1.06 ms per loop
In [124]: %timeit b(df3)
1000 loops, best of 3: 1.04 ms per loop
In [125]: %timeit c(df4)
1000 loops, best of 3: 1.41 ms per loop
df3 = df.copy()
df4 = df.copy()
def a(df):
df1 = df.filter(regex='^foo')
df2 = df[df.columns.difference(df1.columns)]
return df1, df2
def b(df):
df1 = df[[col for col in df.columns if col.startswith('foo')]]
df2 = df[[col for col in df.columns if not col.startswith('foo')]]
return df1, df2
def c(df):
df1 = df[df.columns[df.columns.str.startswith('foo')]]
df2 = df[df.columns[~df.columns.str.startswith('foo')]]
return df1, df2
df1, df2 = a(df)
print (df1)
print (df2)
df1, df2 = b(df3)
print (df1)
print (df2)
df1, df2 = c(df4)
print (df1)
print (df2)