最近几天,我花了很多时间来学习如何构建数据科学项目,以使其保持简单,可重用和pythonic。坚持this guideline,我创建了my_project
。您可以在下面看到它的结构。
├── README.md
├── data
│ ├── processed <-- data files
│ └── raw
├── notebooks
| └── notebook_1
├── setup.py
|
├── settings.py <-- settings file
└── src
├── __init__.py
│
└── data
└── get_data.py <-- script
我定义了一个从.data/processed
加载数据的函数。我想在其他脚本以及.notebooks中的jupyter笔记本中使用此功能。
def data_sample(code=None):
df = pd.read_parquet('../../data/processed/my_data')
if not code:
code = random.choice(df.code.unique())
df = df[df.code == code].sort_values('Date')
return df
很明显,除非我直接在定义该脚本的脚本中运行它,否则该函数将无法在任何地方使用。
我的想法是在要声明的地方创建settings.py
from os.path import join, dirname
DATA_DIR = join(dirname(__file__), 'data', 'processed')
现在我可以写:
from my_project import settings
import os
def data_sample(code=None):
file_path = os.path.join(settings.DATA_DIR, 'my_data')
df = pd.read_parquet(file_path)
if not code:
code = random.choice(df.code.unique())
df = df[df.code == code].sort_values('Date')
return df
问题:
这是通常的方式以这种方式引用文件吗? settings.DATA_DIR
看起来很丑。
这到底是应该如何使用settings.py
?并将其放置在此目录中吗?我在.samr/settings.py
下的repo
我了解可能没有“一个正确的答案”,我只是想找到处理这些问题的逻辑,优雅的方法。
答案 0 :(得分:1)
只要您不提交大量数据,并且可以弄清不受控制的外部环境快照和您自己的派生数据(代码+ {raw
)==状态之间的区别。有时使用仅追加ish raw并考虑诸如raw/interesting_source/2018.csv.gz -> raw_appendonly/interesting_source/2018.csv.gz.20180401T12:34:01
之类的符号链接步骤或一些类似的模式来建立“使用最新的”输入结构是有用的。尝试明确区分可能需要根据env进行更改的配置设置(my_project/__init__.py
,config.py
,settings.py
或其他任何内容)(设想将fs换成blobstore或其他内容)。 setup.py通常位于最高级别my_project/setup.py
中,并且位于my_project/my_project
中与可运行内容(不是文档,示例不确定)相关的任何内容。在一个地方(_mydir = os.path.dirname(os.path.realpath(__file__))
)中定义一个config.py
,并以此为依据避免痛苦。
答案 1 :(得分:0)
否,只有在使用Django的情况下,才可以使用settings.py。至于以这种方式引用数据目录,取决于您是否希望用户能够更改此值。设置它来更改值的方式需要编辑settings.py文件。如果您希望用户拥有默认值,但又希望他们在使用函数时可以轻松更改它,则只需内联创建基本路径值,然后在def data_sample(...,datadir = filepath):.中将其设为默认值即可。 / p>
答案 2 :(得分:0)
我正在维护一个基于DataDriven Cookiecutter的经济学数据项目,我认为这是一个很好的模板。
分离数据文件夹和代码对我来说是一个优势,可以将您的工作视为直接转换的流程('DAG'),从不可变的初始数据开始,一直到最终结果。
最初,我回顾了pkg_resources
,但拒绝使用它(语法冗长且缺乏对打包的理解),而是支持在目录中导航的自己的辅助函数/类。
本质上,助手要做两件事
1。坚持项目根文件夹和其他常量路径:
# shorter version
ROOT = Path(__file__).parents[3]
# longer version
def find_repo_root():
"""Returns root folder for repository.
Current file is assumed to be:
<repo_root>/src/kep/helper/<this file>.py
"""
levels_up = 3
return Path(__file__).parents[levels_up]
ROOT = find_repo_root()
DATA_FOLDER = ROOT / 'data'
UNPACK_RAR_EXE = str(ROOT / 'bin' / 'UnRAR.exe')
XL_PATH = str(ROOT / 'output' / 'kep.xlsx')
这类似于您对DATA_DIR
所做的操作。一个可能的弱点是我在这里
手动对助手文件相对于项目根目录的相对位置进行硬编码。如果帮助文件位置已移动,则需要对其进行调整。但是,嘿,这与Django中的操作相同。
2。允许访问raw
,interim
和processed
文件夹中的特定数据。
这可以是一个简单的函数,它通过文件夹中的文件名返回完整路径,例如:
def interim(filename):
"""Return path for *filename* in 'data/interim folder'."""
return str(ROOT / 'data' / 'interim' / filename)
在我的项目中,我有interim
和processed
目录的年月子文件夹,并且按年,月,有时还按频率寻址数据。对于这种数据结构,我有
提供参考特定路径的InterimCSV
和ProcessedCSV
类,例如:
from . helper import ProcessedCSV, InterimCSV
# somewhere in code
csv_text = InterimCSV(self.year, self.month).text()
# later in code
path = ProcessedCSV(2018,4).path(freq='q')
辅助程序is here的代码。另外,这些类会创建子文件夹(如果不存在)(我希望在临时目录中进行子测试),并且有一些方法可以检查文件是否存在以及读取其内容。
在您的示例中,您可以轻松地将根目录固定在setting.py
中,
但我认为您可以进一步抽象数据。
当前data_sample()
混合了文件访问和数据转换,不是一个好兆头,并且还使用一个全局名称,这是函数的另一个不好的兆头。建议您考虑以下事项:
# keep this in setting.py
def processed(filename):
return os.path.join(DATA_DIR, filename)
# this works on a dataframe - your argument is a dataframe,
# and you return a dataframe
def transform_sample(df: pd.DataFrame, code=None) -> pd.DataFrame:
# FIXME: what is `code`?
if not code:
code = random.choice(df.code.unique())
return df[df.code == code].sort_values('Date')
# make a small but elegant pipeline of data transfomation
file_path = processed('my_data')
df0 = pd.read_parquet(file_path)
df = transform_sample(df0)
答案 3 :(得分:0)
您可以使用<!DOCTYPE html>
<html>
<head>
<title>Example 01.02 - First Scene</title>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/three.js/110/three.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/dat-gui/0.7.6/dat.gui.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/three@0.101.1/examples/js/controls/OrbitControls.js"></script>
<style>
body {
margin: 0;
overflow: hidden;
}
</style>
</head>
<body>
<!-- Div which will hold the Output -->
<div id="WebGL-output">
</div>
</body>
</html>
window.onload = init();
animate(); //calling function that does all the rendering
//GLOBAL VARS
var scene, camera, renderer;
var cube;
var raycaster, mouse;
var INTERSECTED;
//global flag
var isClicked = false;
//for the camera
var controls;
//creating and rendering the GUI
params = {
yAxis: "0.00001"
}
var gui = new dat.GUI();
gui.add(params, "yAxis").onFinishChange(val => {
cube.scale.y = parseFloat(val);
});
//we make sure to make it hidden
let vis = gui.domElement.style.visibility;
gui.domElement.style.visibility = vis == "" ? "hidden" : "";
// once everything is loaded, we run our Three.js stuff.
function init() {
// create a scene, that will hold all our elements such as objects, cameras and lights.
scene = new THREE.Scene();
//SET CAMERA
camera = new THREE.PerspectiveCamera(75,window.innerWidth/window.innerHeight,0.1,1000)
camera.position.z = 5;
// create a render and set the size
renderer = new THREE.WebGLRenderer({antialias: true});
renderer.setClearColor("#e5e5e5"); //background color
renderer.setSize(window.innerWidth,window.innerHeight); //size of renderer
//bind rendered to the dom element
document.getElementById("WebGL-output").appendChild(renderer.domElement);
//RAYCASTER
raycaster = new THREE.Raycaster();
mouse = new THREE.Vector2(1,1);
// create a cube
var cubeGeometry = new THREE.BoxGeometry(20, 20, 20);
var cubeMaterial = new THREE.MeshLambertMaterial({color: 0xffff00 }); //0xF7F7F7 = gray
cube = new THREE.Mesh(cubeGeometry, cubeMaterial);
cube.scale.y = 0.00001;
cube.userData.originalColor = 0xffff00;
// position the cube
cube.position.x = 0;
cube.position.y = 3;
cube.position.z = 0;
/*
//USEFUL METHODS
cube.rotation.x +=0.5
cube.scale.x +=0.5
*/
// add the cube to the scene
scene.add(cube);
/* RENDERING A PLANE
var geometry = new THREE.PlaneGeometry( 20, 20);
var material = new THREE.MeshBasicMaterial( {color: 0xffff00, side: THREE.DoubleSide} );
var plane = new THREE.Mesh( geometry, material );
plane.rotation.set(80,0,0);
scene.add( plane );
//plane.position.x = 2;
*/
//ADDING LIGHTS
var ambientLight = new THREE.AmbientLight(0x0c0c0c);
scene.add(ambientLight);
var spotLight = new THREE.SpotLight(0xffffff);
spotLight.position.set(-40, 60, -10);
spotLight.castShadow = true;
scene.add(spotLight);
// position and point the camera to the center of the scene
camera.position.x = -30;
camera.position.y = 40;
camera.position.z = 30;
camera.lookAt(scene.position);
//camera
controls = new THREE.OrbitControls(camera, renderer.domElement);
controls.minDistance = 1;
controls.maxDistance = 1000;
// when the mouse moves, call the given function
document.addEventListener('mousemove', onDocumentMouseMove, false);
//when the mouse is clicked, call the given function
document.addEventListener('click', onDocumentMouseClick, false);
}
function onDocumentMouseMove(event)
{
// the following line would stop any other event handler from firing
// (such as the mouse's TrackballControls)
event.preventDefault();
// update the mouse variable
mouse.x = (event.clientX / window.innerWidth) * 2 - 1;
mouse.y = -(event.clientY / window.innerHeight) * 2 + 1;
// calculate objects intersecting the picking ray
var intersects = raycaster.intersectObjects( scene.children );
//TRY THIS
// intersects = raycaster.intersectObject(cube); // to get the cube only
//if the mouse hovers over the cube mesh, change its color to red
//when mouse leaves the mesh, change it back to its original color
//ONLY MAKE THESE MODIFICATION IF THE MESH IS NOT CLICKED
//BECAUSE IF IT IS CLICKED, YOU HAVE TO PAINT THE MESH ACCORDING TO THE onDocumentMouseClick()
if ( intersects.length > 0 && intersects[ 0 ].object === cube && isClicked === false)
{
cube.material.color.set( 0xF7F7F7 );
}
else if (isClicked === false)
{
cube.material.color.set( cube.userData.originalColor );
}
}
// 0xff0000 red
//0xF7F7F7 = gray
function onDocumentMouseClick(event) //if we detect a click event
{
// the following line would stop any other event handler from firing
// (such as the mouse's TrackballControls)
event.preventDefault();
// update the mouse variable
mouse.x = (event.clientX / window.innerWidth) * 2 - 1;
mouse.y = -(event.clientY / window.innerHeight) * 2 + 1;
// calculate objects intersecting the picking ray
var intersects = raycaster.intersectObjects( scene.children );
//if mouse is on top of the mesh when the click occurs, change color of mesh and render GUI
if ( intersects.length > 0 && intersects[ 0 ].object === cube && isClicked === false)
{
isClicked = true;
cube.material.color.set( 0xff0000);
/*
var params = {
textField: "Enter value:"
}
var item = gui.add(params, "textField").onFinishChange(function (value) {
//Do something with the new value
//console.log(value);
cube.scale.y +=value;
});
*/
//when its clicked, change the visibily of the GUI
vis = gui.domElement.style.visibility;
gui.domElement.style.visibility = vis == "" ? "hidden" : "";
}
//if mouse is on top of the mesh when the click occurs, but it already marked as 'clicked', now mark it as 'unclicked'
else if ( intersects.length > 0 && intersects[ 0 ].object === cube && isClicked === true)
{
isClicked = false;
cube.material.color.set( cube.userData.originalColor );
//when its clicked, change the visibily of the GUI
vis = gui.domElement.style.visibility;
gui.domElement.style.visibility = vis == "" ? "hidden" : "";
// gui.__proto__.constructor.toggleHide()
//dat.GUI.toggleHide();
//gui.toggleHide()
}
}
function render()
{
// update the picking ray with the camera and mouse position
raycaster.setFromCamera( mouse, camera );
renderer.render(scene, camera); //render the scene
}
function animate()
{
requestAnimationFrame( animate ); //pauses when user switches tab
controls.update();
render();
}
打开文件并将其保存在变量中,并在希望引用文件的任何地方继续使用该变量。
open()
或
with open('Test.txt','r') as f:
,然后使用f=open('Test.txt','r')
来引用文件。
如果您希望文件可读写,则可以使用f
代替r+
。