我对深度学习和计算机视觉非常陌生。我想做一些人脸识别项目。为此,我从tensorflow文档中的this文章的帮助下从Internet下载了一些图像并将其转换为Tensorflow数据集。现在,我想将该数据集转换为pandas数据框,以便将其转换为csv文件。我尝试了很多,但无法做到。 有人可以帮我吗。 这是用于创建数据集的代码,然后是我为此尝试的一些错误代码。
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filenames = tf.constant(['al.jpg', 'al2.jpg', 'al3.jpg', 'al4.jpeg','al5.jpeg', 'al6.jpeg','al7.jpg','al8.jpeg', '5.jpg', 'hrit8.jpeg', 'Hrithik-Roshan.jpg', 'Hrithik.jpg', 'hriti1.jpeg', 'hriti2.jpg', 'hriti3.jpeg', 'hritik4.jpeg', 'hritik5.jpg', 'hritk9.jpeg', 'index.jpeg', 'sah.jpeg', 'sah1.jpeg', 'sah3.jpeg', 'sah4.jpg', 'sah5.jpg','sah6.jpg','sah7.jpg'])
labels = tf.constant([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 2])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess = tf.Session()
print(sess.run([image, labels]))
最初,我只是尝试使用df = pd.DataFrame(dataset)
然后我遇到以下错误:
enter code here
ValueError Traceback (most recent call last)
<ipython-input-15-d5503ae4603d> in <module>()
----> 1 df = pd.DataFrame((dataset))
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
402 dtype=values.dtype, copy=False)
403 else:
--> 404 raise ValueError('DataFrame constructor not properly called!')
405
406 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
此后,我碰到了this文章,我犯了一个错误:在张量流中,任何事物仅存在于一个会话中。所以我尝试了以下代码:
with tf.Session() as sess:
df = pd.DataFrame(sess.run(dataset))
如果我犯了最愚蠢的错误,请原谅我,因为我从类推print(sess.run(dataset))
编写了上面的代码,但得到了一个更大的错误:
TypeError: Fetch argument <BatchDataset shapes: ((?, 28, 28, 3), (?,)), types: (tf.float32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>, must be a string or Tensor. (Can not convert a BatchDataset into a Tensor or Operation.)
答案 0 :(得分:1)
我认为您可以像这样使用 map 。我假设您要按照here的说明向数据帧添加一个numpy数组。但是,您必须一个接一个地追加,还要弄清楚整个数组如何适合数据框的一列。
import tensorflow as tf
import pandas as pd
import pandas as pd
filenames = tf.constant(['C:/Machine Learning/sunflower/50987813_7484bfbcdf.jpg'])
labels = tf.constant([1])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
sess = tf.Session()
def convert_to_dataframe(filename, label):
print ( pd.DataFrame.from_records(filename))
return filename, label
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.map( lambda filename, label: tf.py_func(convert_to_dataframe,
[filename, label],
[tf.float32,tf.int32]))
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess.run([image, labels])
答案 1 :(得分:0)
一种简单的方法是将数据集保存到普通的 csv 文件中,然后直接将 csv 文件读入 Pandas 数据帧。
# Construct a tf.data.Dataset
ds = tfds.load('civil_comments/CivilCommentsCovert', split='train')
#read the dataset into a tensorflow styled_dataframe
df = tfds.as_dataframe(ds)
#save the dataframe into csv file
df.to_csv("/.../.../Desktop/covert_toxicity.csv")
#read the csv file as normal, then you have the df you need
import pandas as pd
file_path = "/.../.../Desktop/covert_toxicity.csv"
df = pd.read_csv(file_path, header = 0, sep=",")
df