Question

我正在学习如何使用张量流，希望将其用于预测差异基因表达。我首先尝试了用于线性回归的代码（最终希望根据数据集中的其他统计数据来预测列'logFC'是高于还是低于基因的某个阈值，但此刻我专注于只是能够运行并理解代码），但是我的代码陷入了“ IndexError：列表索引超出范围”的问题，

data = pd.read_csv("geo2r.csv")
data = data.drop(["ID","Gene.title","Gene.symbol"],1)
my_feature = data[["P.Value"]]
feature_columns = [tf.feature_column.numeric_column("P.Value")]
targets = data["logFC"]
my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0000001)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=feature_columns,
    optimizer=my_optimizer
)
def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):

    features = {key:np.array(value) for key,value in dict(features).items()}                                           

    ds = tf.data.Dataset.from_tensor_slices((features,targets)) 
    ds = ds.batch(batch_size).repeat(num_epochs)
    if shuffle:
        ds = ds.shuffle(buffer_size=10000)

    features, labels = ds.make_one_shot_iterator().get_next()
    return features, labels

_ = linear_regressor.train(
    input_fn = lambda:my_input_fn(features, targets),
    steps=100
)

IndexError: list index out of range

数据集如下所示，包含50,000多个行：

我还将我想使用的代码和完整数据放在github上： https://github.com/640008915/Learning-Tensorflow

我的P.Value数字是否太小而无法容纳？在正确方向上的任何帮助或指导，将不胜感激。

如何使用张量流求解线性回归问题？

0 个答案: