pandas DataFrame:选择一组列,包括一系列列

时间:2014-02-26 21:02:53

标签: r select pandas dataframe

如果我有R data.frame df

colnames(df)
[1] "a" "b" "c" "d" "e"

我可以选择“a”,“c”,“d”和“e”列,如下所示:

df[ , c(1, 3:5)]

熊猫有一个简单的等价物吗?我知道我可以使用

df.loc[:, ['a', 'c', 'd', 'e']]

这适用于几列。

对于许多列序列,R代码仍然很简单

df2[ , c(1:10, 25:30, 40, 50:100)]

2 个答案:

答案 0 :(得分:7)

更新:无需使用numpy.hstack,您可以按以下方式致电numpy.r_

使用iloc + numpy.r_

In [20]: df = DataFrame(randn(10, 3), columns=list('abc'))

In [21]: df
Out[21]: 
          a         b         c
0  0.228163 -1.311485 -1.335604
1  0.292547 -1.636901  0.001765
2  0.744605 -0.325580  0.205003
3 -0.580471 -0.531553 -0.740697
4  0.250574  1.076019 -0.594915
5 -0.148449  0.076951 -0.653595
6 -1.065314 -0.166018 -1.471532
7  1.133336 -0.529738 -1.213841
8 -1.715281 -2.058831  0.113237
9 -0.382412 -0.072540  0.294853

[10 rows x 3 columns]

In [22]: df.iloc[:, r_[:2]]
Out[22]: 
          a         b
0  0.228163 -1.311485
1  0.292547 -1.636901
2  0.744605 -0.325580
3 -0.580471 -0.531553
4  0.250574  1.076019
5 -0.148449  0.076951
6 -1.065314 -0.166018
7  1.133336 -0.529738
8 -1.715281 -2.058831
9 -0.382412 -0.072540

[10 rows x 2 columns]

要连接整数范围,请使用numpy.r_

In [35]: df = DataFrame(randn(10, 6), columns=list('abcdef'))

In [36]: df.iloc[:, r_[:2, 2:df.columns.size:2]]
Out[36]: 
          a         b         c         e
0 -1.358623 -0.622909  0.025609 -1.166303
1  0.527027  0.310530  2.892384  0.190451
2 -0.251138 -1.246113  0.738264  0.062078
3 -1.716028  0.419139  0.060225 -1.191527
4 -1.308635  0.045396 -0.599367 -0.202491
5 -0.620343  0.796364 -0.008802  0.160020
6  0.199739  0.111816 -0.278119  1.051317
7 -0.311206  0.090348 -0.237887  0.958215
8  0.363161  2.449031  1.023352  0.743853
9  0.039451 -0.855733 -0.836921 -0.835078

[10 rows x 4 columns]

答案 1 :(得分:0)

现在你可以在 python 中使用类似的语法了:

>>> from datar.all import c, f, select
>>> from datar.datasets import starwars
>>> 
>>> starwars
              name    height      mass hair_color   skin_color eye_color  birth_year      sex     gender homeworld  species
          <object> <float64> <float64>   <object>     <object>  <object>   <float64> <object>   <object>  <object> <object>
0   Luke Skywalker     172.0      77.0      blond         fair      blue        19.0     male  masculine  Tatooine    Human
1            C-3PO     167.0      75.0        NaN         gold    yellow       112.0     none  masculine  Tatooine    Droid
2            R2-D2      96.0      32.0        NaN  white, blue       red        33.0     none  masculine     Naboo    Droid
3      Darth Vader     202.0     136.0       none        white    yellow        41.9     male  masculine  Tatooine    Human
..             ...       ...       ...        ...          ...       ...         ...      ...        ...       ...      ...
4      Leia Organa     150.0      49.0      brown        light     brown        19.0   female   feminine  Alderaan    Human
82             Rey       NaN       NaN      brown        light     hazel         NaN   female   feminine       NaN    Human
83     Poe Dameron       NaN       NaN      brown        light     brown         NaN     male  masculine       NaN    Human
84             BB8       NaN       NaN       none         none     black         NaN     none  masculine       NaN    Droid
85  Captain Phasma       NaN       NaN    unknown      unknown   unknown         NaN      NaN        NaN       NaN      NaN
86   Padmé Amidala     165.0      45.0      brown        light     brown        46.0   female   feminine     Naboo    Human

[87 rows x 11 columns]
>>> 
>>> starwars >> select(c(1, f[3:5], 7))
              name      mass hair_color   skin_color  birth_year
          <object> <float64>   <object>     <object>   <float64>
0   Luke Skywalker      77.0      blond         fair        19.0
1            C-3PO      75.0        NaN         gold       112.0
2            R2-D2      32.0        NaN  white, blue        33.0
3      Darth Vader     136.0       none        white        41.9
..             ...       ...        ...          ...         ...
4      Leia Organa      49.0      brown        light        19.0
82             Rey       NaN      brown        light         NaN
83     Poe Dameron       NaN      brown        light         NaN
84             BB8       NaN       none         none         NaN
85  Captain Phasma       NaN    unknown      unknown         NaN
86   Padmé Amidala      45.0      brown        light        46.0

[87 rows x 5 columns]
>>> 
>>> # even with column names
>>> starwars >> select(c(f.name, f[f.mass:f.skin_color], f.birth_year))
              name      mass hair_color   skin_color  birth_year
          <object> <float64>   <object>     <object>   <float64>
0   Luke Skywalker      77.0      blond         fair        19.0
1            C-3PO      75.0        NaN         gold       112.0
2            R2-D2      32.0        NaN  white, blue        33.0
3      Darth Vader     136.0       none        white        41.9
..             ...       ...        ...          ...         ...
4      Leia Organa      49.0      brown        light        19.0
82             Rey       NaN      brown        light         NaN
83     Poe Dameron       NaN      brown        light         NaN
84             BB8       NaN       none         none         NaN
85  Captain Phasma       NaN    unknown      unknown         NaN
86   Padmé Amidala      45.0      brown        light        46.0

[87 rows x 5 columns]

我是 datar 包的作者。如果您有任何问题,请随时提交问题。