我试图在python pandas中创建一个新列,并且不断出现(不稳定)重复的KeyError。脚本的该部分非常简单,因此我不确定是什么引起了错误,因为数据集中的任何列都没有相同的名称。
我的目标是创建一个新列,并将其附加到包含ticket_contents列内容的新翻译的数据框。 这是数据示例;
25483 0 outstanding 0 Los-Angeles e-payment delayed Ticket 1/7/19 7:54
39363 0 outstanding 0 Los-Angeles e-payment delayed Ticket 1/7/19 7:54
83584 0 outstanding 6 Los-Angeles e-payment delayed Ticket 1/7/19 7:54
34537 0 outstanding 7 Los-Angeles e-payment lost Ticket 1/7/19 7:53
colnames = ['id', 'ln_id', 'status',
'number_outstanding', 'country', 'subject', 'ticket_contents', 'subtopic',
'date']
test_data = pandas.read_csv(test_data, names = colnames, encoding
= 'utf-8')
test_data = pandas.DataFrame(test_data)
translated_description = []
from_lang = 'tl'
to_lang = 'en-us'
def test_translation(contents):
translator = Translator(from_lang = from_lang, to_lang = to_lang)
translation = translator.translate(contents)
translated_description.append(translation)
#print(translated_description)
for contents, row in test_data.iterrows():
contents = test_data.ticket_contents.iloc[contents -1]
test_translation(contents)
test_data['translated_descriptions'].copy = translated_description
以下是错误输出:
KeyError Traceback (most recent call last)
<ipython-input-70-55e39cf5e328> in <module>()
16 test_translation(contents)
17
---> 18 test_data['translated_descriptions'].copy = translated_description
19
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1962 return self._getitem_multilevel(key)
1963 else:
-> 1964 return self._getitem_column(key)
1965
1966 def _getitem_column(self, key):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
1969 # get column
1970 if self.columns.is_unique:
-> 1971 return self._get_item_cache(key)
1972
1973 # duplicate columns & possible reduce dimensionality
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1643 res = cache.get(item)
1644 if res is None:
-> 1645 values = self._data.get(item)
1646 res = self._box_item_values(item, values)
1647 cache[item] = res
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_loc(self, key, method, tolerance)
2442 return self._engine.get_loc(key)
2443 except KeyError:
-> 2444 return self._engine.get_loc(self._maybe_cast_indexer(key))
2445
2446 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()
KeyError: u'translated_descriptions'
答案 0 :(得分:0)
我同意您不应遍历数据框的评论。您应该将所有值计算为列表,数组或系列,然后一次分配所有值。
但是您的错误来自此行:
test_data['translated_descriptions'].copy = translated_description
它的作用是覆盖copy
系列的test_data['translated_descriptions']
属性/方法。由于该系列尚不存在,因此会出现错误。
要使用您的值序列创建新列,我将执行以下操作:
test_data = test_data.assign(translated_descriptions=translated_description_values)
答案 1 :(得分:0)
错误发生在:
test_data['translated_descriptions'].copy = translated_description
它实际上包含什么:
test_data['translated_descriptions'].copy
-是对不存在列的copy
方法的引用。... = translated_description
-您尝试将列表替换为
此参考。如果要创建新列,只需写:
test_data['translated_descriptions'] = translated_description
如果您想摆脱评论中提到的错误,那么:
df2 = test_data.copy()
(调用整个 DataFrame的copy
方法,而不是其列)。df2
-新的DataFrame。有关如何改进程序的一些提示:
在翻译功能之外定义translator
:
translator = Translator(from_lang = from_lang, to_lang = to_lang)
然后将翻译功能定义为:
def test_translation(contents):
return translator.translate(contents)
然后可以简单地创建新的colun:
test_data['translated_descriptions'] = \
test_data.ticket_contents.apply(test_translation)
没有任何中间列表。
还要查看程序的以下片段:
test_data = pandas.read_csv(test_data, names = colnames,
encoding = 'utf-8')
test_data = pandas.DataFrame(test_data)
请注意:
test_data
变量下。结果是:
previous
DataFrame存在于某个地方,但现在无法访问。结论:删除第二条指令。 一个就足够了 数据框。