我是一个初学者,所以当train.jsonl使用这样的格式时,无法在以下代码中指出错误的原因
{"claim": "But he said if people really want to know if they have CHIP they can get a blood test that costs a few MONEYc1", "evidence": "sentenceID100037", "label": "0"}
{"claim": "This is rather a courtly formulation and would doubtless trigger further eyerolling if uttered in", "evidence": "sentenceID100038", "label": "0"}
顶部执行没有问题并显示数据。
import pandas as pd
prefix = '/content/'
train_df = pd.read_json(prefix + 'train.jsonl', orient='records', lines=True)
train_df.head()
[See my Colab Notebook][https://colab.research.google.com/gist/lenyabloko/0e17ebe0f3a0e808779bc1fa95e9b24d/semeval2020-delex.ipynb]
我什至尝试了这个额外的技巧,它解释了有关0列的评论
prefix = '/content/'
train_df = pd.read_json(prefix + 'train_delex.jsonl', orient='columns')
train_df.to_csv(prefix+'train.tsv', sep='\t', index=False, header=False)
train_df = pd.read_csv(prefix + 'train.tsv', header=None)
train_df.head()
现在,我看到的是标记为“ 0”的列,而不是原始的三列{“ claim”:“ ...”,“ evidence”:“ ...”,“ label”:“ ...”}上面的JSONL文件(为什么?)
但是当我添加DataFrame代码时会导致错误
train_df = pd.DataFrame({
'id': train_df[1],
'text': train_df[0],
'labels':train_df[2]
})
鉴于名为“ 0”的列将不起作用。但是那列是从哪里来的??
KeyError Traceback (most recent call last)
2 frames
<ipython-input-16-0537eda6b397> in <module>()
6
7 train_df = pd.DataFrame({
----> 8 'id': train_df[1],
9 'text': train_df[0],
10 'labels':train_df[2]
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2993 if self.columns.nlevels > 1:
2994 return self._getitem_multilevel(key)
-> 2995 indexer = self.columns.get_loc(key)
2996 if is_integer(indexer):
2997 indexer = [indexer]
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
答案 0 :(得分:0)
以下是对我有用的解决方案:
import pandas as pd
prefix = '/content/'
test_df = pd.read_json(prefix + 'test_delex.jsonl', orient='records', lines=True)
test_df.rename(columns={'claim': 'text', 'evidence': 'id', 'label':'labels'}, inplace=True)
cols = test_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
cols = cols[-1:] + cols[:-1]
test_df = test_df[cols]
test_df.to_csv(prefix+'test.csv', sep=',', index=False, header=False)
test_df.head()
我更新了上面问题中链接的共享Colab笔记本