验证熊猫中的水平行数据

时间:2020-10-19 15:57:11

标签: python pandas validation

我来这里寻求帮助。我正在使用类似的数据:

df1:
      name   name1   name2
A      13     13      13
B      13     27      57
C      12     12      12
D      26     23       2

我正在尝试使用如下代码:

def val(df):
    ret = []
    for idx, row in df.iterrows():
        if row.nunique()==1:
           ret.append(f'The values of {idx} in name, name1, name2 are corrects')
        else:
           ret(["".join(f'*The values in {idx} are:', 
           ', '.join(f'{c} in {v}' for v,c in row.iteritems()),
           'Check your data before compare.']))
    return ret

这里的问题是,运行不正常。首先,我需要将结果作为字符串而不是列表。我知道使用"".join()是可能的,但是当我尝试代码时,我只是得到最后的结果,而不是我想要的全部答案。 请,如何获得完整答案?我希望看到更多的选择,而不仅仅是一个。

Example:
-The values of A in name, name1, name3 are corrects. 
- The values in B are:
  13 in name, 27 in name3 and 57 in name2.
  Check your data before compare.
-The values of C in name, name1, name3 are corrects.
- The values in D are:
  26 in name, 23 in name3 and 2 in name2.
  Check your data before compare.

3 个答案:

答案 0 :(得分:0)

def val(df):
    ret = []
    for idx, row in df.iterrows():
        if row.nunique() == 1:
            ret.append(f'- The values of {idx} in name, name1, name2 are corrects')
        else:
            ret.append(
                f"- The values in {idx} are:\n"\             
                f"  {row[0]} in name, {row[1]} in name1, {row[2]} in name2.\n"\
                "  Check your data before compare."
            )
    return ret    
ans = val(df)

输出

for i in ans:
    print(i)

- The values of A in name, name1, name2 are corrects
- The values in B are:
  13 in name, 27 in name1, 57 in name2.
  Check your data before compare.
- The values of C in name, name1, name2 are corrects
- The values in D are:
  26 in name, 23 in name1, 2 in name2.
  Check your data before compare.

答案 1 :(得分:0)

如何像这样将字符串组成np.where子句的一部分。

所有其他答案都复制了您遍历行的原始方法,这种方法在玩具数据集之外效率低下。 np.where是向量化操作,因此比自定义函数要快得多,并且其逻辑更直接。唯一的警告是字符串插值在这里不起作用,因此多行语法有点尴尬。

import pandas as pd
import numpy as np
from io import StringIO

data = StringIO("""
index  name   name1   name2
A      13     13      13
B      13     27      57
C      12     12      12
D      26     23       2
""")

df = pd.read_csv(data, delim_whitespace=True, index_col="index")

results = np.where(
    df.nunique(axis=1) == 1,
    'The values in ' + df.index + ' in name, name1, name2 are the same\n',
    'The values in ' + df.index + ' are:\n' + \
    df["name"].astype(str)  + ' in name, ' + \
    df["name1"].astype(str) + ' in name1, ' + \
    df["name2"].astype(str) + ' in name2.\nCheck your data.\n'
)

print(*results, sep='\n')

答案 2 :(得分:0)

import pandas as pd
df = pd.DataFrame({'name': {'A': 13, 'B': 13, 'C': 12, 'D': 26},
                   'name1': {'A': 13, 'B': 27, 'C': 12, 'D': 23},
                   'name2': {'A': 13, 'B': 57, 'C': 12, 'D': 2}})

由于函数有很多错误,因此很难知道如何纠正它。

您可以将Pandas系列用作带有string formatting的字典。

In [25]: s = '{name:} in name, {name1:} in name1, {name2:} in name2'

In [26]: row = df.loc['A',:]

In [27]: print(s.format(**row))
13 in name, 13 in name1, 13 in name2

In [28]: for idx,row in df.iterrows():
    ...:     print(idx, s.format(**row))
    ...:     
A 13 in name, 13 in name1, 13 in name2
B 13 in name, 27 in name1, 57 in name2
C 12 in name, 12 in name1, 12 in name2
D 26 in name, 23 in name1, 2 in name2

使用格式化的字符串文字(f-string)。

In [29]: for idx,row in df.iterrows():
    ...:     print(idx, f'''{row['name']} in name, {row['name1']} in name1, {row['name2']} in name2''')
    ...:     
A 13 in name, 13 in name1, 13 in name2
B 13 in name, 27 in name1, 57 in name2
C 12 in name, 12 in name1, 12 in name2
D 26 in name, 23 in name1, 2 in name2