我有一个熊猫数据框,其中包含来自24小时购物地点的以下客户数据:
Date #Cust at 00:00 Items/Cust at 00:00 Ttl Items at 00:00 #Cust at 01:00 Items/Cust at 01:00 Ttl Items at 01:00 ....#Cust at 23:00 Items/Cust at 23:00 Ttl Items at 23:00
1/1/2018 2 4 8 1 5 5 3 3 9
1/2/2018 2 5 10 1 5 5 3 4 12
....
我想将其转换为简单的时间序列数据帧:
Time Stamp #Cust Items/Cust Ttl Cust
00:00 1/1/2018 2 4 8
01:00 1/1/2018 1 5 5
.....
23:00 1/1/2018 3 3 9
00:00 1/1/2018 2 5 10
01:00 1/1/2018 1 5 5
.....
23:00 1/1/2018 3 4 12
等...
我知道它应该包含pd.melt,但是鉴于我有多个值列,因此我无法弄清楚语法。
答案 0 :(得分:0)
您可以先创建DataFrame.set_index
,然后按str.split
,然后按DataFrame.stack
,按列创建DataFrame.swaplevel
。上次数据清除-DataFrame.rename_axis
,DataFrame.reset_index
和{{3}}:
at
如果需要日期时间:
MultiIndex
也为df = df.set_index('Date')
df.columns = df.columns.str.split(' at ', expand=True)
df1 = df.stack().swaplevel(1,0).rename_axis(('Time','Stamp')).reset_index()
print (df1)
Time Stamp #Cust Items/Cust Ttl Items
0 00:00 1/1/2018 2 4 8
1 01:00 1/1/2018 1 5 5
2 23:00 1/1/2018 3 3 9
3 00:00 1/2/2018 2 5 10
4 01:00 1/2/2018 1 5 5
5 23:00 1/2/2018 3 4 12
:
df = df.set_index('Date')
df.columns = df.columns.str.split(' at ',expand=True)
df1 = df.stack().swaplevel(1,0).rename_axis(('TimeStamp','Date')).reset_index()
df1['TimeStamp'] = pd.to_datetime(df1.pop('Date') + ' ' + df1['TimeStamp'])
print (df1)
TimeStamp #Cust Items/Cust Ttl Items
0 2018-01-01 00:00:00 2 4 8
1 2018-01-01 01:00:00 1 5 5
2 2018-01-01 23:00:00 3 3 9
3 2018-01-02 00:00:00 2 5 10
4 2018-01-02 01:00:00 1 5 5
5 2018-01-02 23:00:00 3 4 12
答案 1 :(得分:0)
另一种方法是使用pandas.wide_to_long
static void Main(string[] args)
{
var auth = new AzureServiceTokenProvider();
const string url = "https://storage.azure.com/";
string token = auth.GetAccessTokenAsync(url).Result;
string requestUri = "https://xxx.dfs.core.windows.net/t11/b.txt?action=append&position=0";
var method = new HttpMethod("PATCH");
// read local file as stream
var mystream = File.OpenRead(@"D:\temp\1\test1.txt");
Console.WriteLine($"the stream length is: {mystream.Length}");
Console.WriteLine($"the position of the stream is: {mystream.Position}");
var stream_length = mystream.Length;
var request = new HttpRequestMessage(method, requestUri)
{
//Content = new StringContent(upload_string)
Content = new StreamContent(mystream)
};
// Add some defined headers
request.Headers.Authorization = new AuthenticationHeaderValue("Bearer", token);
request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue("text/plain"));
var i = request.Content.AsString().Length;
Console.WriteLine(request.Content.AsString());
var httpClient = new HttpClient();
var result = httpClient.SendAsync(request).Result;
Console.WriteLine("append result status code: "+ (int)result.StatusCode);
//for flush
string requestUri_2 = "https://xxx.dfs.core.windows.net/t11/b.txt?action=flush&position="+stream_length;
var request_2 = new HttpRequestMessage(method,requestUri_2);
using (HttpClient httpClient_2 = new HttpClient())
{
httpClient_2.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token);
HttpResponseMessage response = httpClient_2.SendAsync(request_2).Result;
Console.WriteLine("flush result status code: " + (int)response.StatusCode);
}
输出:
import pandas as pd
new_df = pd.wide_to_long(df, ['#Cust', 'Ttl Items', 'Items/Cust'],
i='Date',
j='time',
sep = ' at ', suffix='.+').reset_index()
new_df.index = pd.to_datetime(new_df['Date'] + ' ' + new_df['time'], dayfirst=True)
new_df.drop(['Date', 'time'], 1, inplace=True)
print(new_df)