用熊猫重新排列列:是否等效于dplyr的select(...,everything())?

时间:2020-03-01 18:26:18

标签: python r pandas dataframe dplyr

我正在尝试通过首先放置几列,然后再放置其他所有列来重新排列DataFrame中的列。

使用R的string s1= q.front; ,它看起来像:

dplyr

容易。使用Python的library(dplyr) df = tibble(col1 = c("a", "b", "c"), id = c(1, 2, 3), col2 = c(2, 4, 6), date = c("1 Feb", "2 Feb", "3 Feb")) df2 = select(df, id, date, everything()) ,这是我尝试过的方法:

pandas

import pandas as pd df = pd.DataFrame({ "col1": ["a", "b", "c"], "id": [1, 2, 3], "col2": [2, 4, 6], "date": ["1 Feb", "2 Feb", "3 Feb"] }) # using sets cols = df.columns.tolist() cols_1st = {"id", "date"} cols = set(cols) - cols_1st cols = list(cols_1st) + list(cols) # wrong column order df2 = df[cols] # using lists cols = df.columns.tolist() cols_1st = ["id", "date"] cols = [c for c in cols if c not in cols_1st] cols = cols_1st + cols # right column order, but is there a better way? df3 = df[cols] 方式较为繁琐,但是我对此还很陌生。有更好的方法吗?

3 个答案:

答案 0 :(得分:3)

您可以使用df.drop

>>> df = pd.DataFrame({
    "col1": ["a", "b", "c"],
    "id": [1, 2, 3],
    "col2": [2, 4, 6],
    "date": ["1 Feb", "2 Feb", "3 Feb"]
    })

>>> df

  col1  id  col2   date
0    a   1     2  1 Feb
1    b   2     4  2 Feb
2    c   3     6  3 Feb

>>> cols_1st = ["id", "date"]

>>> df[cols_1st + list(df.drop(cols_1st, 1))]

   id   date col1  col2
0   1  1 Feb    a     2
1   2  2 Feb    b     4
2   3  3 Feb    c     6

答案 1 :(得分:1)

就像在 R 中使用 datar 一样简单:

>>> from datar.all import c, f, tibble, select, everything
>>> df = tibble(col1 = c("a", "b", "c"),
...             id = c(1, 2, 3),
...             col2 = c(2, 4, 6),
...             date = c("1 Feb", "2 Feb", "3 Feb"))
>>>             
>>> df2 = select(df,
...              f.id, f.date, everything())
>>>              
>>> df2
       id     date     col1    col2
  <int64> <object> <object> <int64>
0       1    1 Feb        a       2
1       2    2 Feb        b       4
2       3    3 Feb        c       6

我是包的作者。如果您有任何问题,请随时提交问题。

答案 2 :(得分:0)

通常,R和Python Pandas之间的最佳翻译是使用基数R,基数R遵循相同的语义,例如在向量上进行逻辑索引,此处为列名。请注意以下与否定和in函数的相似之处:

# R 
mycols <- c("id", "date")
df2 <- df[c(mycols, colnames(df)[!colnames(df) %in% c(mycols)])]


# PANDAS (OLDER, NON-RECOMMENDED WAY)
mycols = ["id", "date"]
df2 = df[mycols + df.columns[~df.columns.isin(mycols)].tolist()]

# PANDAS (CURRENT, RECOMMENDED WAY WITH reindex)
df2 = df.reindex(mycols + df.columns[~df.columns.isin(mycols)].tolist(),
                 axis='columns')