如何使用pandas wide_to_long将数据框拆分为第一列作为索引列和余额列(以3为一组)分成单个数据框。
我有如下示例数据框:
columns = [timestamp, BQ_0, BP_0, BO_0, BQ_1, BP_2, BO_2, BQ_3, BP_3,BO_3, BQ_4, BP_4, BO_4, BQ_4, BP_4, BO_4]
09:15:00 900 29450.00 2 20 29,436 1 100 29425.15 1 60 29352.05 1 20 29352.00 1
09:15:01 900 29450.00 2 20 29,436 1 100 29425.15 1 60 29352.05 1 20 29352.00 1
09:15:02 20 29412.40 1 20 29,410 1 80 29410.10 1 20 29407.60 1 20 29388.90 1
09:15:03 80 29430.20 1 80 29,430 1 80 29430.05 2 20 29430.00 1 20 29424.75 1
09:15:04 120 29445.80 1 40 29,440 2 40 29440.10 1 40 29440.05 1 20 29439.10 1
我想使用pandas wide_to_long在[timestamp,BQ_,BP_,BO_]组中分解此数据框,其中 _Q =数量,_P =价格,_O =订单,
我希望我的结果数据框如下所示:
timestamp, BQ_, BP_, BO_
09:15:00 900 29450.00 2 <= 1st Row
09:15:00 20 29,436 1
09:15:00 100 29425.15 1
09:15:00 60 29352.05 1
09:15:00 20 29352.00 1
09:15:01 900 29450.00 2 <= 2nd Row
09:15:01 20 29,436 1
09:15:01 100 29425.15 1
09:15:01 60 29352.05 1
09:15:01 20 29352.00 1
09:15:02 20 29412.40 1 <= 3rd Row
09:15:02 20 29,410 1
...
答案 0 :(得分:2)
来源:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.wide_to_long.html
pandas.wide_to_long(df,存根名称,i,j,sep ='',后缀='\ d +')
df : DataFrame The wide-format DataFrame stubnames : str or list-like The stub name(s). The wide format variables are assumed to start with the stub names. i : str or list-like Column(s) to use as id variable(s) j : str The name of the sub-observation variable. What you wish to name your suffix in the long format. sep : str, default “” A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format. For example, if your column names are A-suffix1, A-suffix2, you can strip the hyphen by specifying sep=’-‘ New in version 0.20.0. suffix : str, default ‘\d+’ A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form A-one, B-two,.., and you have an unrelated column A-rating, you can ignore the last one by specifying suffix=’(!?one|two)’ New in version 0.20.0. Changed in version 0.23.0: When all suffixes are numeric, they are cast to int64/float64.
您可以这样尝试
result = pd.wide_to_long(df, stubnames=['BQ_','BP_','BO_'], i=['timestamp'],j="Number")