假设我有以下 df:
quantity#1 taxsubtotal#1 taxrate#1 quantity#2 taxsubtotal#2 taxrate#2
-- ------------ --------------- ----------- ------------ --------------- -----------
0 nan 1.05 21 nan nan nan
2 1 2.1 21 1 1.8 9
6 1 0 0 nan nan nan
13 1 0.9 9 1 1.8 9
21 1 23.4 9 1 2.7 9
我不想将 NaN 值写入 df 的列:
df3 = pd.DataFrame({
'InvoiceLine1':"""
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1'].astype(str)+"""</cbc:InvoicedQuantity>
<cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#1'].astype(str)+"""</cbc:TaxAmount>
<cbc:Percent>"""+dftaxitems1['taxrate#1'].astype(str)+"""</cbc:Percent>""",
'InvoiceLine2':"""
<cbc:ID>2</cbc:ID>
<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#2'].astype(str)+"""</cbc:InvoicedQuantity>
<cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#2'].astype(str)+"""</cbc:TaxAmount>
<cbc:Percent>"""+dftaxitems1['taxrate#2'].astype(str)+"""</cbc:Percent>""",
})
评估nan的类型:
type:
type(dftaxitems['quantity#2'][0])
numpy.float64
获得以下输出:
InvoiceLine1 InvoiceLine2
0 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
2 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
13 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
所需的输出:
InvoiceLine1 InvoiceLine2
0 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua...
2 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua...
13 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
df3.fillna('')
无效!
你们有什么可以帮助的:)?
我已经尝试将所有值转换为 np.nan 以便它可以在新的 df 中准确删除
请帮忙!
答案 0 :(得分:1)
尝试先将值转换为字符串,然后将空字符串转换为缺失值:
df = df.astype(str).replace('', np.nan)
然后像 .astype(str)
一样删除 dftaxitems1['quantity#1'].astype(str)
。
测试:
dftaxitems1 = pd.DataFrame({'quantity#1': ['', 1.0, 1.0, 1.0, 1.0]})
dftaxitems1 = dftaxitems1.astype(str).replace('', np.nan)
s = """<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1']+"""</cbc:InvoicedQuantity>"""
print (s)
0 NaN
1 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
2 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
3 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
4 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
Name: quantity#1, dtype: object