假设我有以下数据:
ID basetime basevalue timestamp2 value2 timestamp3 value3
0 gj93 01/01/19 50 01/02/19 60 01/03/19 70
1 mif3 02/01/19 70 02/02/19 80 02/03/19 90
我将如何解决这个问题
ID Date Label Value
gj93 01/01/19 basetime 50
gj93 01/02/19 timestamp2 60
gj93 01/01/19 timestamp3 70
mif3 02/01/19 basetime 70
mif3 02/01/19 timestamp2 80
mif3 02/01/19 timestamp3 90
一个警告,以后的一些值可能会丢失,例如timestamp3 ...
谢谢!
答案 0 :(得分:2)
熊猫的melt应该可以工作。
out = pd.melt(df, id_vars=['ID'], value_vars=['basetime', 'timestamp2', 'timestamp3'], var_name="Label", value_name="Date")
out['Value'] = pd.melt(df, value_vars=['basevalue', 'value2', 'value3'])['value']
答案 1 :(得分:2)
一个较长的版本,它在结构上超出了要求的范围。
import pandas as pd
from io import StringIO
# Sample data
df = pd.read_fwf(StringIO("""
i ID basetime basevalue timestamp2 value2 timestamp3 value3
0 gj93 01/01/19 50 01/02/19 60 01/03/19 70
1 mif3 02/01/19 70 02/02/19 80 02/03/19 90
"""), header=1, parse_dates=[2,4,6], index_col=0)
# melt to a vertical/tall format
df2 = df.melt(id_vars="ID").sort_values(["ID", "variable"])
# replace basetime and basevalue with timestamp1 and basevalue1 respectively
# ... to be consistent with other names
df2['variable'] = df2['variable'].str.replace("basetime", "timestamp1") \
.str.replace("basevalue", "value1")
# extract the sequence number to a column and remove the sequence from the variable name
df2['seq'] = df2['variable'].str.replace("[^\d]", "")
df2['variable'] = df2['variable'].str.replace("\d+$", "")
df3 = df2.sort_values(["ID", "seq", "variable"])
# join back on itself to matchup the time and value rows,
df4 = df3[df3.variable == 'timestamp'].merge(df3[df3.variable=='value'], on=['ID', 'seq'])
# Clean up - taking and renaming only the neded values
df5 = df4[['ID', 'value_x', 'value_y']]
df5.columns = ['ID', 'timestamp', 'value']
# ID timestamp value
#0 gj93 2019-01-01 00:00:00 50
#1 gj93 2019-01-02 00:00:00 60
#2 gj93 2019-01-03 00:00:00 70
#3 mif3 2019-02-01 00:00:00 70
#4 mif3 2019-02-02 00:00:00 80
#5 mif3 2019-02-03 00:00:00 90