我正在读取的.csv文件包含值为“NA”的单元格。熊猫自动将这些转换为NaN,我不想要。我知道keep_default_na=False
参数,但会将列的dtype更改为object
,这意味着pd.get_dummies
无法正常工作。
有没有办法阻止大熊猫在不更改dtype的情况下将“NA”作为NaN读取?
答案 0 :(得分:7)
keep_default_na=False
适合我
from io import StringIO
import pandas as pd
txt = """col1,col2
a,b
NA,US"""
print(pd.read_csv(StringIO(txt), keep_default_na=False))
col1 col2
0 a b
1 NA US
没有它
print(pd.read_csv(StringIO(txt)))
col1 col2
0 a b
1 NaN US
答案 1 :(得分:1)
这种方法对我有用:
import pandas as pd
df = pd.read_csv('Test.csv')
co1 col2 col3 col4
a b c d e
NaN NaN NaN NaN NaN
2 3 4 5 NaN
我复制了该值并创建了一个列表,默认情况下将其解释为 NaN,然后注释掉我想解释为非 NaN 的 NA。这种方法仍然将除 NA 之外的其他值视为 NaN。
na_values = ["",
"#N/A",
"#N/A N/A",
"#NA",
"-1.#IND",
"-1.#QNAN",
"-NaN",
"-nan",
"1.#IND",
"1.#QNAN",
"<NA>",
"N/A",
# "NA",
"NULL",
"NaN",
"n/a",
"nan",
"null"]
df1 = pd.read_csv('Test.csv',na_values=na_values,keep_default_na=False )
co1 col2 col3 col4
a b c d e
NaN NA NaN NA NaN
2 3 4 5 NaN
答案 2 :(得分:0)
您可以尝试首先将列转换为str:
for index, row in df.iterrows():
na_column = str(row['your_row'])
if na_column != 'nan':
# do something on column
答案 3 :(得分:0)
这就是熊猫的生存能力
na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:
If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.