将行中的值分隔为单个列

时间:2018-05-30 14:18:53

标签: python-3.x pandas

我的CSV包含以下数据:

date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1

我需要得到如下结果,即A1,B1,C1值到目前为止应分隔为单独的列:

date,datetime,year,month,date,A1,B1,C1
20170430,2017-04-30 18:30:00,2017,04,30,NaN,1.23,32.1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,NaN,42.2

我尝试使用带有索引的python pandas pivot方法作为日期和列作为名称,但由于A1和C1有多个条目,因此会出现以下错误:

ValueError: Index contains duplicate entries, cannot reshape

import pandas as pd

df = pd.read_csv("D:/datagenicAPI/finalCSV.csv")
print(df)
df1 = df.pivot(index="date", columns="name")
df1.to_csv("d:/datagenicAPI/test1.csv", sep=",")

我需要将其分隔为单独的列,我是否可以知道如何使用python pandas实现相同的目标

2 个答案:

答案 0 :(得分:1)

正在加载您的示例df:

 GridView gv = new GridView();
        var From = RExportFrom;
        var To = RExportTo;
            if (RExportFrom == null || RExportTo == null)
            {
                /* The actual code to be used */
                gv.DataSource = db.Referrals.OrderBy(m =>m.Date_Logged).ToList();
            }
            else
            {
                gv.DataSource = db.Referrals.Where(m => m.Date_Logged >= From && m.Date_Logged <= To).OrderBy(m => m.Date_Logged).ToList();
            }

            gv.DataBind();

        foreach (GridViewRow row in gv.Rows)
        {
            if (row.Cells[20].Text.Contains("&lt;"))
            {
                row.Cells[20].Text = Regex.Replace(row.Cells[20].Text, "&lt;(?<tag>.+?)(&gt;|>)", " ");
            }

            if (row.Cells[21].Text.Contains("&lt;"))
            {
                row.Cells[21].Text = Regex.Replace(row.Cells[21].Text, "&lt;(?<tag>.+?)(&gt;|>)", " ");
            }

            if (row.Cells[22].Text.Contains("&lt;"))
            {
                row.Cells[22].Text = Regex.Replace(row.Cells[22].Text, "&lt;(?<tag>.+?)(&gt;|>)", " ");
            }

            if (row.Cells[37].Text.Contains("&lt;"))
            {
                row.Cells[37].Text = Regex.Replace(row.Cells[37].Text, "&lt;(?<tag>.+?)(&gt;|>)", " ");
            }

            if (row.Cells[50].Text.Contains("&lt;"))
            {
                row.Cells[50].Text = Regex.Replace(row.Cells[37].Text, "&lt;(?<tag>.+?)(&gt;|>)", " ");
            }

        }
        Response.ClearContent();
        Response.Buffer = true;
        Response.AddHeader("content-disposition", "attachment; filename=Referrals " + DateTime.Now.ToString("dd/MM/yyyy") + ".xls");
        Response.ContentType = "application/ms-excel";
        Response.ContentEncoding = System.Text.Encoding.UTF8;
        Response.AddHeader("Content-Type", "application/vnd.ms-excel");
        Response.Charset = "";
        Response.Cache.SetCacheability(HttpCacheability.NoCache);
        StringWriter sw = new StringWriter();
        HtmlTextWriter htw = new HtmlTextWriter(sw);
        gv.RenderControl(htw);
        //This code will export the data to Excel and remove all HTML Tags to pass everything into Plain text.
        //I am using HttpUtility.HtmlDecode twice as the first instance changes null values to "Â" the second time it will run the replace code.
        //I am using Regex.Replace to change the headings to more understandable headings rather than the headings produced by the Model.
        Response.Write(HttpUtility.HtmlDecode(sw.ToString())
            .Replace("Cover_Details", "Referral Detail")
            .Replace("Id", "Identity Number")
            .Replace("Unique_Ref", "Reference Number")
            .Replace("Date_Logged", "Date Logged")
            .Replace("Logged_By", "File Number")
            .Replace("Date_Referral", "Date of Referral")
            .Replace("Referred_By", "Name of Referrer")
            .Replace("UWRules", "Underwriting Rules")
            .Replace("Referred_To", "Name of Referrer")
            );
        Response.Flush();
        Response.End();
        TempData["success"] = "Data successfully exported!";
        return RedirectToAction("Index");
    }

使用pivot_table和reset_index,你得到:

import io
import pandas as pd

s = """
date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1
"""
df = pd.read_csv(io.StringIO(s))

请注意,df包含一个由pandas命名为&#39; date.1&#39;的列,因为在您的示例中有两列名为&#39; date&#39;。

答案 1 :(得分:0)

我认为需要两个步骤,drop_duplicates + unstack,然后concat结果

s=df.drop_duplicates('date').iloc[:,:4]
pd.concat([s.set_index('date'),df.set_index(['date','name']).value.unstack()],axis=1)
Out[339]: 
                     datetime  year  month     A1    B1    C1
date                                                         
20170430  2017-04-30 18:30:00  2017      4    NaN  1.23  32.1
20170501  2017-05-01 18:30:00  2017      5  121.2   NaN  42.2