我的CSV包含以下数据:
date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1
我需要得到如下结果,即A1,B1,C1值到目前为止应分隔为单独的列:
date,datetime,year,month,date,A1,B1,C1
20170430,2017-04-30 18:30:00,2017,04,30,NaN,1.23,32.1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,NaN,42.2
我尝试使用带有索引的python pandas pivot方法作为日期和列作为名称,但由于A1和C1有多个条目,因此会出现以下错误:
ValueError: Index contains duplicate entries, cannot reshape
import pandas as pd
df = pd.read_csv("D:/datagenicAPI/finalCSV.csv")
print(df)
df1 = df.pivot(index="date", columns="name")
df1.to_csv("d:/datagenicAPI/test1.csv", sep=",")
我需要将其分隔为单独的列,我是否可以知道如何使用python pandas实现相同的目标
答案 0 :(得分:1)
正在加载您的示例df:
GridView gv = new GridView();
var From = RExportFrom;
var To = RExportTo;
if (RExportFrom == null || RExportTo == null)
{
/* The actual code to be used */
gv.DataSource = db.Referrals.OrderBy(m =>m.Date_Logged).ToList();
}
else
{
gv.DataSource = db.Referrals.Where(m => m.Date_Logged >= From && m.Date_Logged <= To).OrderBy(m => m.Date_Logged).ToList();
}
gv.DataBind();
foreach (GridViewRow row in gv.Rows)
{
if (row.Cells[20].Text.Contains("<"))
{
row.Cells[20].Text = Regex.Replace(row.Cells[20].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[21].Text.Contains("<"))
{
row.Cells[21].Text = Regex.Replace(row.Cells[21].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[22].Text.Contains("<"))
{
row.Cells[22].Text = Regex.Replace(row.Cells[22].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[37].Text.Contains("<"))
{
row.Cells[37].Text = Regex.Replace(row.Cells[37].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[50].Text.Contains("<"))
{
row.Cells[50].Text = Regex.Replace(row.Cells[37].Text, "<(?<tag>.+?)(>|>)", " ");
}
}
Response.ClearContent();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment; filename=Referrals " + DateTime.Now.ToString("dd/MM/yyyy") + ".xls");
Response.ContentType = "application/ms-excel";
Response.ContentEncoding = System.Text.Encoding.UTF8;
Response.AddHeader("Content-Type", "application/vnd.ms-excel");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter htw = new HtmlTextWriter(sw);
gv.RenderControl(htw);
//This code will export the data to Excel and remove all HTML Tags to pass everything into Plain text.
//I am using HttpUtility.HtmlDecode twice as the first instance changes null values to "Â" the second time it will run the replace code.
//I am using Regex.Replace to change the headings to more understandable headings rather than the headings produced by the Model.
Response.Write(HttpUtility.HtmlDecode(sw.ToString())
.Replace("Cover_Details", "Referral Detail")
.Replace("Id", "Identity Number")
.Replace("Unique_Ref", "Reference Number")
.Replace("Date_Logged", "Date Logged")
.Replace("Logged_By", "File Number")
.Replace("Date_Referral", "Date of Referral")
.Replace("Referred_By", "Name of Referrer")
.Replace("UWRules", "Underwriting Rules")
.Replace("Referred_To", "Name of Referrer")
);
Response.Flush();
Response.End();
TempData["success"] = "Data successfully exported!";
return RedirectToAction("Index");
}
使用pivot_table和reset_index,你得到:
import io
import pandas as pd
s = """
date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1
"""
df = pd.read_csv(io.StringIO(s))
请注意,df包含一个由pandas命名为&#39; date.1&#39;的列,因为在您的示例中有两列名为&#39; date&#39;。
答案 1 :(得分:0)
我认为需要两个步骤,drop_duplicates
+ unstack
,然后concat
结果
s=df.drop_duplicates('date').iloc[:,:4]
pd.concat([s.set_index('date'),df.set_index(['date','name']).value.unstack()],axis=1)
Out[339]:
datetime year month A1 B1 C1
date
20170430 2017-04-30 18:30:00 2017 4 NaN 1.23 32.1
20170501 2017-05-01 18:30:00 2017 5 121.2 NaN 42.2