防止大熊猫将“NA”读作NaN

时间:2017-01-01 16:58:11

标签: python pandas

我正在读取的.csv文件包含值为“NA”的单元格。熊猫自动将这些转换为NaN,我不想要。我知道keep_default_na=False参数,但会将列的dtype更改为object,这意味着pd.get_dummies无法正常工作。

有没有办法阻止大熊猫在不更改dtype的情况下将“NA”作为NaN读取?

4 个答案:

答案 0 :(得分:7)

keep_default_na=False适合我

from io import StringIO
import pandas as pd

txt = """col1,col2
a,b
NA,US"""

print(pd.read_csv(StringIO(txt), keep_default_na=False))

  col1 col2
0    a    b
1   NA   US

没有它

print(pd.read_csv(StringIO(txt)))

  col1 col2
0    a    b
1  NaN   US

答案 1 :(得分:1)

这种方法对我有用:

import pandas as pd
df = pd.read_csv('Test.csv')
co1 col2  col3  col4
a   b    c  d   e
NaN NaN NaN NaN NaN
2   3   4   5   NaN

我复制了该值并创建了一个列表,默认情况下将其解释为 NaN,然后​​注释掉我想解释为非 NaN 的 NA。这种方法仍然将除 NA 之外的其他值视为 NaN。

 na_values = ["", 
             "#N/A", 
             "#N/A N/A", 
             "#NA", 
             "-1.#IND", 
             "-1.#QNAN", 
             "-NaN", 
             "-nan", 
             "1.#IND", 
             "1.#QNAN", 
             "<NA>", 
             "N/A", 
#              "NA", 
             "NULL", 
             "NaN", 
             "n/a", 
             "nan", 
             "null"]
df1 = pd.read_csv('Test.csv',na_values=na_values,keep_default_na=False )

      co1  col2  col3  col4
a     b     c     d     e
NaN  NA   NaN    NA   NaN
2     3     4     5   NaN

答案 2 :(得分:0)

您可以尝试首先将列转换为str:

for index, row in df.iterrows():
    na_column = str(row['your_row'])
    if na_column != 'nan':
        # do something on column

答案 3 :(得分:0)

这就是熊猫的生存能力

na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.