我正在编写一些数据测试。超级简单没什么疯狂的。
这是我当前目录的样子。
.
├── README.md
├── hive_tests
│ ├── __pycache__
│ ├── schema_checks_hive.py
│ ├── test_schema_checks_hive.py
│ └── yaml
│ └── job_output.address_stats.yaml
└── postgres
├── __pycache__
├── schema_checks_pg.py
├── test_schema_checks_pg.py
└── yaml
当我cd
进入postgres并运行pytest
时,我所有的测试都通过了。
当我cd
进入hive_test并运行pytest
时,出现导入错误。
这是我的schema_checks_hive.py
文件。
from pyhive import hive
import pandas as pd
import numpy as np
import os, sys
import yaml
def check_column_name_hive(schema, table):
query = "DESCRIBE {0}.{1}".format(schema, table)
df = pd.read_sql_query(query, conn)
print(df.columns)
return df.columns
check_column_name_hive('myschema', 'mytable')
这是我的test_schema_checks_hive.py
文件,测试位于此文件中。
import schema_checks_hive as sch
import pandas as pd
import yaml
import sys, os
def test_column_names_hive():
for filename in os.listdir('yaml'):
data = ""
with open("yaml/{0}".format(filename), 'r') as stream:
data = yaml.safe_load(stream)
schema = data['schema']
table = data['table']
cols = data['columns']
df = sch.check_column_name_hive(schema, table)
assert len(cols) == len(df)
assert cols == df.tolist()
运行Pytest
时出现错误消息:
mportError while importing test module '/Usersdata/
tests/hive_tests/test_schema_checks_hive.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test_schema_checks_hive.py:1: in <module>
import schema_checks_hive as sch
schema_checks_hive.py:1: in <module>
from pyhive import hive
E ModuleNotFoundError: No module named 'pyhive
我很乐意提供帮助!非常感谢。