Question

我正在构建ML分类器。为此，我有一个数据集，该数据集分为6个.jsonl文件。它们每个都超过1.6GB。首先，我尝试了以下代码：

class Tree: pass

root = Tree
root.left = Tree
print(root == root.left) # True, because references the class

root = Tree()
root.left = Tree()
print(root == root.left) # False, because references two different instances

哪个给了我错误“ trailingError”。

所以我在“ read_json”中使用了“块大小”和“行”。

import pandas as pd
data=pd.read_json("train_features_0.jsonl")

哪个给出“ pandas.io.json.json.JsonReader at 0x136bce302b0”

数据集包含：train_features_0.jsonl，train_features_1.jsonl，train_features_2.jsonl，train_features_3.jsonl，train_features_4.jsonl，train_features_5.jsonl。

所以我的问题是如何使用所有这些.jsonl文件来训练我的分类器？

另一个问题是，在训练分类器时如何使用特定的“名称：值”对？我的意思是我可以删除一些name：value对以加快训练过程。

请原谅我，我是ML的新手。

如何在python中读取多个大型.jsonl文件

0 个答案: