Question

Python的新手。我正在导入CSV，然后，如果缺少任何数据，我需要返回带有附加列的CSV，以指示哪些行缺少数据。同事建议我将CSV导入数据框，然后使用“注释”列创建一个新的数据框，在预期行上填充注释，然后将其附加到原始数据框。我被困在填充新数据框“ dferr”的步骤中，该数据框的行数与“ dfinput”相匹配。

具有Googled，“熊猫csv返回缺少数据的错误列”，但未发现与创建标记不良行的新CSV相关的任何内容。我什至不知道提议的方法是否是解决此问题的最佳方法。

export default { c: 'c', d: 'd' }

预期结果：“ dferr”将具有一个索引列，其行数等于“ dfinput”，并在“ dfinput”缺少值的正确行上进行注释。

实际结果：“ dferr”为空。

Answer 1

我对“丢失数据”的理解为空值。似乎每一行都需要空字段的名称。

df = pd.DataFrame([[1,2,3],
                   [4,None,6],
                   [None,8,None]],
                  columns=['foo','bar','baz'])
# Create a dataframe of True/False, True where a criterion is met
# (in this case, a null value)
nulls = df.isnull()

# Iterate through every row of *nulls*,
# and extract the column names where the value is True by boolean indexing
colnames = nulls.columns
null_labels = nulls.apply(lambda s:colnames[s], axis=1)

# Now you have a pd.Series where every entry is an array
# (technically, a pd.Index object)
# Pandas arrays have a vectorized .str.join method:
df['nullcols'] = null_labels.str.join(', ')

pandas中的.apply()方法有时会成为代码中的瓶颈；有避免使用此方法的方法，但在这里似乎是我能想到的最简单的解决方案。

编辑：这是另一种单线（而不是使用.apply），可能会稍微减少计算时间：

import numpy as np
df['nullcols'] = [colnames[x] for x in nulls.values]

这可能更快（需要做更多的工作）：

np.where(df.isnull(),df.columns,'')

Python3-返回CSV，其中包含行级错误，用于丢失数据

1 个答案: