通过在终端

时间:2018-03-15 12:22:47

标签: python machine-learning terminal ipython jupyter-notebook

我想在终端中明确指定我的火车和测试集。而不是在终端中运行.ipynb文件时在代码中指定它们。 截至目前,这就是我正在做的事情。

# FOR TRAINING DATA

# LISTING OUT ALL FILES PRESENT IN FOLDER PATH
path = "C:/Users/****/****/Latest_Datasets/base_out"
files = os.listdir(path)
df = pd.DataFrame()

# APPENDING THE ALL DATA FROM THE FOLDER PATH TO DATAFRAME
for f in files:
    data = pd.read_csv(f, 'Sheet1',delimiter='\t',usecols=['details','amount','category'],encoding=("utf-8"))
    df = df.append(data)
df.reset_index(level=0, inplace=True)
df['index1'] = df.index
df=df[['index1','amount','details','category']]

# FOR TEST DATA

test_data=pd.read_csv('testfile.csv',
 delimiter='\t',usecols=['xn_details','xn_amount','category'],encoding='utf-8')


x_train, y_train = (df.details, df.category )
x_test, y_test = (test_data.details, test_data.category)

# After this I apply my model and get my classifications for my test.details

我想将训练数据和测试数据作为参数提供给终端,而不是在脚本中指定。 我该怎么做呢。 提前致谢

1 个答案:

答案 0 :(得分:0)

您可以导入sys模块,然后使用sys.argv命令在命令行中传递参数。

import sys
#everything else remains the same
.
.
.
 test_data=pd.read_csv(sys.argv[1],
 delimiter='\t',usecols=['xn_details','xn_amount','category'],encoding='utf-8')

sys.argv[0] #the first argument stores the python file name such as "test.py"
sys.argv[1] #this will store the csv file that you want to pass as an argument to pd.read_csv(). You need to pass this as a command line argument.

因此,在命令行中,您应该执行以下行:

C:\>python test.py testfile.csv  #test.py is the name of your python file *.py