我试图用熊猫重塑一张桌子。一年中每天有365行的日期列。每小时24列,每个值24列,对应当天的小时。我正在尝试使用day + hour(每天24行)和具有相应值的列创建列。这是一个当前的头():
Date | hour1 | value1 | hour2 | value2 ... hour24 | value 24
2016-01-01 | 1 | 4100 | 2 | 3500 | 24 | 5200
Here is the desired format:
Date | value
2016-01-01 01 | 4100
2016-01-01 02 | 3500
....
2016-01-01 24 | 5200
我尝试过熔化和旋转,但无法按天+小时列进行排序。
答案 0 :(得分:0)
您需要dict
hour
,然后将A
转换为lreshape
,并在to_timedelta
之后删除Date
列,如有必要print (df)
Date hour1 value1 hour2 value2 hour24 value24
0 2016-01-01 1 4100 2 3500 24 5200
1 2016-01-02 1 3000 2 3700 24 7200
a = [col for col in df.columns if col.startswith('hour')]
b = [col for col in df.columns if col.startswith('value')]
df = pd.lreshape(df, {'A' : a, 'B' : b})
df['Date'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['A'], unit='h')
df = df.drop('A', axis=1).sort_values('Date')
print (df)
Date B
0 2016-01-01 01:00:00 4100
2 2016-01-01 02:00:00 3500
4 2016-01-02 00:00:00 5200
1 2016-01-02 01:00:00 3000
3 2016-01-02 02:00:00 3700
5 2016-01-03 00:00:00 7200
栏drop
:
df = df.set_index('Date')
mux = df.columns.to_series().str.extract('([A-Za-z]+)(\d+)', expand=True)
df.columns = pd.MultiIndex.from_arrays([mux[0], mux[1]], names=('a','b'))
df = df.stack(1).reset_index()
df['Date'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['hour'], unit='h')
df = df.drop(['b', 'hour'], axis=1).rename_axis(None, axis=1)
print (df)
Date value
0 2016-01-01 01:00:00 4100
1 2016-01-01 02:00:00 3500
2 2016-01-02 00:00:00 5200
3 2016-01-02 01:00:00 3000
4 2016-01-02 02:00:00 3700
5 2016-01-03 00:00:00 7200
另一种解决方案是sort_values
创建MultiIndex.from_arrays
并按str.extract
重新塑造:
[JsonConverter(typeof(JsonSubtypes), "TypeName")]
[JsonSubtypes.KnownSubType(typeof(Type1), "Type1")]
[JsonSubtypes.KnownSubType(typeof(Type2), "Type2")]
public interface IPoco
{
string TypeName { get; }
}
public class Type1 : IPoco
{
public string TypeName { get; } = "Type1";
/* ... */
}
public class Type2 : IPoco
{
public string TypeName { get; } = "Type2";
/* ... */
}