在3列上使用Wide_to_Long

时间:2019-10-06 07:38:16

标签: python python-3.x pandas

如何使用pandas wide_to_long将数据框拆分为第一列作为索引列和余额列(以3为一组)分成单个数据框。

我有如下示例数据框:

columns = [timestamp, BQ_0, BP_0, BO_0, BQ_1, BP_2, BO_2, BQ_3, BP_3,BO_3, BQ_4, BP_4, BO_4, BQ_4, BP_4, BO_4]

09:15:00     900    29450.00     2   20      29,436      1   100    29425.15     1   60     29352.05     1   20     29352.00     1
09:15:01     900    29450.00     2   20      29,436      1   100    29425.15     1   60     29352.05     1   20     29352.00     1
09:15:02     20     29412.40     1   20      29,410      1   80     29410.10     1   20     29407.60     1   20     29388.90     1
09:15:03     80     29430.20     1   80      29,430      1   80     29430.05     2   20     29430.00     1   20     29424.75     1
09:15:04     120    29445.80     1   40      29,440      2   40     29440.10     1   40     29440.05     1   20     29439.10     1

我想使用pandas wide_to_long在[timestamp,BQ_,BP_,BO_]组中分解此数据框,其中 _Q =数量,_P =价格,_O =订单,

我希望我的结果数据框如下所示:

timestamp,   BQ_,   BP_,         BO_
09:15:00     900    29450.00     2 <= 1st Row
09:15:00     20     29,436       1
09:15:00     100    29425.15     1
09:15:00     60     29352.05     1
09:15:00     20     29352.00     1
09:15:01     900    29450.00     2 <= 2nd Row
09:15:01     20     29,436       1
09:15:01     100    29425.15     1
09:15:01     60     29352.05     1
09:15:01     20     29352.00     1
09:15:02     20     29412.40     1 <= 3rd Row
09:15:02     20      29,410      1
...

1 个答案:

答案 0 :(得分:2)

来源:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.wide_to_long.html

pandas.wide_to_long(df,存根名称,i,j,sep ='',后缀='\ d +')

df : DataFrame
The wide-format DataFrame

stubnames : str or list-like
The stub name(s). The wide format variables are assumed to start with the stub names.

i : str or list-like
Column(s) to use as id variable(s)

j : str
The name of the sub-observation variable. What you wish to name your suffix in the long format.

sep : str, default “”
A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format. For example, if your column names are A-suffix1, A-suffix2, you can strip the hyphen by specifying sep=’-‘

New in version 0.20.0.

suffix : str, default ‘\d+’
A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form A-one, B-two,.., and you have an unrelated column A-rating, you can ignore the last one by specifying suffix=’(!?one|two)’

New in version 0.20.0.

Changed in version 0.23.0: When all suffixes are numeric, they are cast to int64/float64.

您可以这样尝试

result = pd.wide_to_long(df, stubnames=['BQ_','BP_','BO_'], i=['timestamp'],j="Number")