Question

我正在使用科学数据集，其中可测量值用数字表示，而不可测量值用默认字符串“ Present pd.read_csv似乎将所有值都转换为某些列中的字符串（不知道为什么会这样）。因此，我想将所有数值作为适当的类型，例如float，并将所有“ Present

我已经找到了解决混合dtypes的方法，并且可以将逻辑应用于单个列，但是由于某些原因，当我在循环中应用相同的逻辑时，它不起作用：

# Dummy data:
lst = ['1.01', '2.05', 'Present < RDL', '3.50', '1.23', 'Present < RDL', '1.72']
lst2 = ['1.2', 'Present < RDL', '0.75', '1.53', '2.34', 'Present < RDL', '0.96']
data = {'test1': lst, 'test2': lst2}
data = pd.DataFrame(data)

# Works to convert numeric values in series from string to float.
lst = []
for i in data.test1:
    try:
        lst.append(float(i))
    except:
        lst.append(i)
test = pd.Series(lst)

# Verify that numbers have been converted to numeric type.
map(type, test)

# Now, the same logic looping through the dataframe columns:
for col in data.columns:
    lst = []
    for i in col:
        try:
            lst.append(float(i))
        except:
            lst.append(i)
    col = pd.DataFrame(lst)

# Shows no change in dtypes.
map(type, data.test1)

我发现大熊猫功能也有类似的趋势，除了让它们持续工作之外还有更多的麻烦。

data.test1 = pd.to_numeric(data.test1, errors='ignore')

我意识到我的第一个解决方案可能不会像熊猫函数那样优雅，因此我对实现目标的任何建议都持开放态度。感谢您的阅读。

更新：

整合下面的答案后，我能够解决循环问题：

for col in data.columns:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(data[col])

Answer 1

使用带有参数pd.to_numeric的{{1}}将字符串转换为errors='coerce'，然后最终Nan将这些字符串与原始列中的字符串一起使用：

fillna

如果我们然后检查每一行的类型：

data['test1'] = pd.to_numeric(data['test1'], errors='coerce').fillna(data['test1'])

我们会根据需要显示混合类型列。

现在我们实际上可以在我们的列上进行计算了，显然对于字符串它会给出奇怪的结果，但这是混合类型列的缺点：

print(data['test1'].apply(type))

0    <class 'float'>
1    <class 'float'>
2      <class 'str'>
3    <class 'float'>
4    <class 'float'>
5      <class 'str'>
6    <class 'float'>
Name: test1, dtype: object

有效地使用混合类型更改数据框值dtype

1 个答案: