Question

我是一个初学者，所以当train.jsonl使用这样的格式时，无法在以下代码中指出错误的原因

{"claim": "But he said if people really want to know if they have CHIP they can get a blood test that costs a few MONEYc1", "evidence": "sentenceID100037", "label": "0"}
{"claim": "This is rather a courtly formulation and would doubtless trigger further eyerolling if uttered in", "evidence": "sentenceID100038", "label": "0"}

顶部执行没有问题并显示数据。

import pandas as pd

prefix = '/content/'
train_df = pd.read_json(prefix + 'train.jsonl', orient='records', lines=True)
train_df.head()

[See my Colab Notebook][https://colab.research.google.com/gist/lenyabloko/0e17ebe0f3a0e808779bc1fa95e9b24d/semeval2020-delex.ipynb]

我什至尝试了这个额外的技巧，它解释了有关0列的评论

prefix = '/content/'
train_df = pd.read_json(prefix + 'train_delex.jsonl', orient='columns')

train_df.to_csv(prefix+'train.tsv', sep='\t', index=False, header=False)
train_df = pd.read_csv(prefix + 'train.tsv', header=None)

train_df.head()

现在，我看到的是标记为“ 0”的列，而不是原始的三列{“ claim”：“ ...”，“ evidence”：“ ...”，“ label”：“ ...”}上面的JSONL文件（为什么？）

但是当我添加DataFrame代码时会导致错误

train_df = pd.DataFrame({
    'id': train_df[1],
    'text': train_df[0],
    'labels':train_df[2]
})

鉴于名为“ 0”的列将不起作用。但是那列是从哪里来的？？

KeyError                                  Traceback (most recent call last)
2 frames
<ipython-input-16-0537eda6b397> in <module>()
      6 
      7 train_df = pd.DataFrame({
----> 8     'id': train_df[1],
      9     'text': train_df[0],
     10     'labels':train_df[2]

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

Answer 1

以下是对我有用的解决方案：


import pandas as pd

prefix = '/content/'
test_df = pd.read_json(prefix + 'test_delex.jsonl', orient='records', lines=True)

test_df.rename(columns={'claim': 'text', 'evidence': 'id', 'label':'labels'}, inplace=True)

cols = test_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
cols = cols[-1:] + cols[:-1]
test_df = test_df[cols]

test_df.to_csv(prefix+'test.csv', sep=',', index=False, header=False)
test_df.head()

我更新了上面问题中链接的共享Colab笔记本

熊猫DataFrame KeyError：1

1 个答案: