我想使用个人数据库在PyTorch上训练一个简单的神经网络。此数据库从Excel文件导入并存储在df
。
其中一列名为"Target"
,它是网络的目标变量。如何使用此数据框作为PyTorch神经网络的输入?
我试过这个,但它不起作用:
target = pd.DataFrame(data = df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)
答案 0 :(得分:5)
您可以使用以下函数将任何数据框或熊猫系列转换为pytorch张量
import pandas as pd
import torch
# determine the supported device
def get_device():
if torch.cuda.is_available():
device = torch.device('cuda:0')
else:
device = torch.device('cpu') # don't have GPU
return device
# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
device = get_device()
return torch.from_numpy(df.values).float().to(device)
df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)
答案 1 :(得分:3)
也许试试看它是否能解决你的问题(基于你的示例代码)?
train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32))
train_tensor = data_utils.TensorDataset(train, train_target)
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)
答案 2 :(得分:3)
您可以将 df.values
属性(一个 numpy 数组)直接传递给 Dataset 构造函数:
import torch.utils.data as data_utils
# Creating np arrays
target = df['Target'].values
features = df.drop('Target', axis=1).values
# Passing to DataLoader
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
注意:您的特征 (df
) 还包含目标变量 (df['Target']
),即您的网络正在“作弊”,因为它可以看到输入中的目标.您需要从功能集中删除此列。
答案 3 :(得分:1)
只需转换pandas dataframe -> numpy array -> pytorch tensor
。下面是一个示例:
import pandas as pd
import numpy as np
import torch
df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)
希望这将帮助您使用pytorch(与pytorch的最新版本兼容)创建自己的数据集。
答案 4 :(得分:0)
#This works for me
target = torch.tensor(df['Targets'].values)
features = torch.tensor(df.drop('Targets', axis = 1).values)
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)