不填充数据帧中的 NaN 值

时间:2021-03-26 10:45:40

标签: python pandas nan

假设我有以下 df:

      quantity#1    taxsubtotal#1    taxrate#1    quantity#2    taxsubtotal#2    taxrate#2
--  ------------  ---------------  -----------  ------------  ---------------  -----------
 0           nan             1.05           21           nan            nan            nan
 2             1             2.1            21             1              1.8            9
 6             1             0               0           nan              nan            nan
13             1             0.9             9             1              1.8            9
21             1            23.4             9             1              2.7            9

我不想将 NaN 值写入 df 的列:

df3 = pd.DataFrame({
'InvoiceLine1':"""
    <cbc:ID>1</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#1'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#1'].astype(str)+"""</cbc:Percent>""",
'InvoiceLine2':"""
    <cbc:ID>2</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#2'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#2'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#2'].astype(str)+"""</cbc:Percent>""",
})

评估nan的类型:

type:
type(dftaxitems['quantity#2'][0])
numpy.float64

获得以下输出:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

所需的输出:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

df3.fillna('') 无效!

你们有什么可以帮助的:)?

我已经尝试将所有值转换为 np.nan 以便它可以在新的 df 中准确删除

请帮忙!

1 个答案:

答案 0 :(得分:1)

尝试先将值转换为字符串,然后将空字符串转换为缺失值:

df = df.astype(str).replace('', np.nan)

然后像 .astype(str) 一样删除 dftaxitems1['quantity#1'].astype(str)

测试:

dftaxitems1 = pd.DataFrame({'quantity#1': ['', 1.0, 1.0, 1.0, 1.0]})
dftaxitems1 = dftaxitems1.astype(str).replace('', np.nan)

s = """<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1']+"""</cbc:InvoicedQuantity>"""
 
print (s)
0                                                  NaN
1    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
2    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
3    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
4    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
Name: quantity#1, dtype: object