用相同数据框中的值填充NaN信息

时间:2019-06-14 08:47:57

标签: python pandas

我想知道有没有更快的方法可以解决此问题而无需使用for循环?

输入数据框如下所示:

bar

我希望输出看起来像这样:

newObj()

using (PdfReader pdfReader = new PdfReader(source, null)) { using (FileStream outputStream = new FileStream(signedPdfPath, FileMode.Create)) { PdfStamper pdfStamper = PdfStamper.CreateSignature(pdfReader, outputStream, '\0', System.IO.Path.GetTempFileName(), true); PdfSignatureAppearance appearance = pdfStamper.SignatureAppearance; appearance.ReasonCaption = "Contact:"; appearance.Reason = "Add signature"; appearance.Location = "Viet Nam"; var page = 1; appearance.SetVisibleSignature(new Rectangle(0, 0, 160, 55), page, "sign" + dateSign); // modify text StringBuilder buf = new StringBuilder(); buf.Append("Signature Valid\n"); buf.Append("Ký bởi: "); String name = "Một triệu ba trăm hai mươi bốn nghìn một trăm ba mươi hai vnd"; buf.Append(name).Append('\n'); buf.Append("Ngày ký: ").Append(DateTime.Now.ToString("dd/MM/yyyy")); string text = buf.ToString(); appearance.Layer2Text = text; string fullPathAppOfCurrentUser = HttpContext.Current.Server.MapPath(""); var FontColour = new BaseColor(0, 0, 255); BaseFont bf = BaseFont.CreateFont(fullPathAppOfCurrentUser, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); appearance.Layer2Font = new Font(bf, 10, Font.BOLD, FontColour); appearance.SignatureRenderingMode = PdfSignatureAppearance.RenderingMode.DESCRIPTION; BcX509.X509Certificate bcCert = DotNetUtils.FromX509Certificate(Cert); var chain = new List<BcX509.X509Certificate> { bcCert }; IExternalSignature pks = new X509Certificate2Signature(Cert, "SHA1"); MakeSignature.SignDetached(appearance, pks, chain, null, null, null, 0, CryptoStandard.CMS); pdfStamper.Dispose(); } } 0 1 2 3 4 5 6 0 x x 1 NaN NaN NaN NaN 1 x y 1 NaN NaN NaN NaN 2 y y 4 4 4 4 4 3 y z 5 2 7 4 0 4 x x NaN 5 7 4 9 5 x y NaN 9 4 5 10 是一些信息。如果我们将这两列作为一个信息,这两列将没有NaN。

此数据框可能非常大,我不知道数据丢失的地方。

1 个答案:

答案 0 :(得分:3)

如果每个组需要第一个非df1 = df.groupby([0,1], as_index=False).first() print (df1) 0 1 2 3 4 5 6 0 x x 1.0 5.0 7.0 4.0 9.0 1 x y 1.0 9.0 4.0 5.0 10.0 2 y y 4.0 4.0 4.0 4.0 4.0 3 y z 5.0 2.0 7.0 4.0 0.0 值,请使用GroupBy.first

print (df)
   0  1     2     3     4    5     6
0  x  x  10.0   NaN   NaN  NaN   NaN
1  x  x  20.0   NaN   NaN  NaN   NaN
2  x  x   1.0   NaN   NaN  NaN   NaN
3  x  y   1.0   NaN   NaN  NaN   NaN
4  y  y   4.0   4.0   4.0  4.0   4.0
5  y  z   5.0   2.0   7.0  4.0   0.0
6  x  x   NaN   5.0   7.0  4.0   9.0
7  x  x   NaN  50.0  70.0  4.0   9.0
8  x  y   NaN   9.0   4.0  5.0  10.0

df1 = df.groupby([0,1], as_index=False).first()
print (df1)
   0  1     2    3    4    5     6
0  x  x  10.0  5.0  7.0  4.0   9.0
1  x  y   1.0  9.0  4.0  5.0  10.0
2  y  y   4.0  4.0  4.0  4.0   4.0
3  y  z   5.0  2.0  7.0  4.0   0.0

如果更多的行每组没有NaN,则可能会丢失一些数据:

def f(x):
    df1 = pd.DataFrame({y: pd.Series(x[y].dropna().values) for y in x})
    return (df1)

df = df.set_index([0,1]).groupby([0,1]).apply(f).reset_index(level=2, drop=True).reset_index()
print (df)
   0  1     2     3     4    5     6
0  x  x  10.0   5.0   7.0  4.0   9.0
1  x  x  20.0  50.0  70.0  4.0   9.0
2  x  x   1.0   NaN   NaN  NaN   NaN
3  x  y   1.0   9.0   4.0  5.0  10.0
4  y  y   4.0   4.0   4.0  4.0   4.0
5  y  z   5.0   2.0   7.0  4.0   0.0

具有自定义功能的可能解决方案:

{{1}}