Question

dfrules包含规则集，dfDataset是主要数据集。现在dfDataset中有一个item_type列，基于该列有规则（INT，RSU等）。 '

规则：对于项目dfrules ['Field'] ['item_type']如果标记为x，则该字段在dfDataset中不能为NaN（例如：Field：Spec_Name，item_type：INT -见图片）。如果确实具有空值，则在dfDataset中创建的Errors列中追加该列名（例如spec_name）。

正在发生的事情：例如，存在item_type ALL的行，该行的Spec_Name列中具有NaN。现在在“错误”列中，我应该为该行单独添加“ Spec_name”。但是我编写的代码在item_type X的所有行中都添加了“ B”。

   for row in Rulefields:
      dfrulefields = dfRules['Field'][(dfRules[row] == "x")]
      dfrulecols = pd.DataFrame(columns=list(dfrulefields))
      dfrulecols.columns = 
      dfrulecols.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
      dfinput = dfDataset[dfDataset['item_type'] == row]
      dfmatchingfields = dfinput[dfinput.columns.intersection(dfrulecols.columns)]
      null_columns=dfmatchingfields.columns[dfmatchingfields.isnull().any()]
      dfnull=dfmatchingfields[dfmatchingfields.isnull().any(axis=1)][null_columns]
      dfinput['Errors'] = dfnull.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)
      if(firstelement == "Yes"):
        dffinal = dfinput.copy()
        firstelement = "No"
      else:
        dffinal = dffinal.append(dfinput)

我不太确定是什么导致了这种行为。请解释，可能的解决方案将不胜感激。

Answer 1

根据给定的意见和我的理解，我正在为您提供解决方案，让我知道这是否适合您：

import numpy as np

df = pd.DataFrame(
    data = {
        'item_type':['INT']*5 + ['RSU']*5,
        'spec_name':['a', 'b','c','d', None,'h', 'g','f','e', None]
    }
)

print df

    item_type   spec_name
0   INT a
1   INT b
2   INT c
3   INT d
4   INT None
5   RSU h
6   RSU g
7   RSU f
8   RSU e
9   RSU None

mandatory_rules = ["INT", "RSU"]
df[ "Error"] = None
df.loc[(df['item_type'].isin(mandatory_rules)) & (df['spec_name'].isna()), "Error"] = "spec_name"

print df

    item_type   spec_name   Error
0   INT a   None
1   INT b   None
2   INT c   None
3   INT d   None
4   INT None    spec_name
5   RSU h   None
6   RSU g   None
7   RSU f   None
8   RSU e   None
9   RSU None    spec_name

使用列上的过滤器进行数据操作

1 个答案: