我在tensorflow中找到了一个用于文本分类的代码,当我尝试运行此代码时:https://www.tensorflow.org/beta/tutorials/keras/feature_columns,我得到一个错误。
我从这里使用了数据集:https://www.kaggle.com/kazanova/sentiment140
Traceback (most recent call last):
File "text_clas.py", line 35, in <module>
train_ds = df_to_dataset(train, batch_size=batch_size)
File "text_clas.py", line 27, in df_to_dataset
labels = dataframe.pop('target')
File "/home/yildiz/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 809, in pop
result = self[item]
File "/home/yildiz/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/yildiz/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'target'
当我打印df.index.name时,我什么都没有。那么数据集不正确还是我做错了什么?
我将dataframe.head()更改为print(dataframe.head())并得到以下输出:
0 ... @switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D
0 0 ... is upset that he can't update his Facebook by ...
1 0 ... @Kenichan I dived many times for the ball. Man...
2 0 ... my whole body feels itchy and like its on fire
3 0 ... @nationwideclass no, it's not behaving at all....
4 0 ... @Kwesidei not the whole crew
[5 rows x 6 columns]
1023999 train examples
256000 validation examples
320000 test examples
答案 0 :(得分:1)
您可能没有正确加载数据。确保火车DataFrame不为空。
如果可以,请检查“ target”是否为列名(例如,在读取csv文件后执行
print(train.head())
,甚至执行print(dataframe.head())
)。
此外,我不太确定会显示什么df.index.name。您是要写df.index.values
吗? (尽管索引可能与您的问题无关。)
编辑:
好的,因此似乎没有分配给数据框的列。您可以通过dataframe.columns = ['target', ...] # and pick the other names
此外,标题是第一列,因此在调用header=False
时应设置read_csv
,然后再设置列(如果不这样做,则会丢失第一行)。
请注意,df.index.name
没有任何意义(就像我之前说过的那样),因此打印NONE
并没有任何意义。
答案 1 :(得分:0)
我下载了相同的csv文件。使用弹出窗口加载并运行命令。有用。确保您已加载正确的DataFrame。您确定DataFrame被命名为“ dataframe”吗?还是“ df”?那么df.pop('target')应该可以工作
答案 2 :(得分:0)
好吧,这是我使用的代码
from __future__ import absolute_import, division, print_function
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
encoding='utf-8-sig'
dataframe = pd.read_csv('~/Schreibtisch/TwitterBot/training_dataset_twitter.csv')
dataframe.head()
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), ' train examples')
print(len(val), ' validation examples')
print(len(test), ' test examples')
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('target')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
for feature_batch, label_batch in train_ds.take(1):
print('Every feature:', list(feature_batch.keys()))
print('A batch of ages:', feature_batch['age'])
print('A batch of targets:', label_batch )