我有以下数据集
Chr Position Name AD
1 866511 A 13,21
1 881627 A 28,33
2 1599812 B 67,25
我需要将AD列分为三列[REF, ALT1, ALT2]
。
当广告的每一行只有两个值时,我仍然需要用NaN值填充ALT2列。
如果AD包含具有三个值的行,则以下代码有效
df['REF'], df['ALT1'], df['ALT2'] = df['AD'].str.split(',', 2).str
但是,在某些情况下,对于每一行,数据集在列AD
中仅包含两个值,当我运行同一行时,会收到以下错误消息:
ValueError: not enough values to unpack (expected 3, got 2)
在这种情况下,我仍然希望保留第三列ALT2
并用NaN
值填充。有什么建议吗?谢谢任何愿意提供帮助的人。
答案 0 :(得分:2)
learning_rate = 0.005
criterion = nn.NLLLoss()
def train(category_tensor, line_tensor):
hidden = rnn.initHidden()
rnn.zero_grad()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in rnn.parameters():
p.data.add_(-learning_rate, p.grad.data)
return output, loss.item()
额外的n_iters = 100000
print_every = 5000
plot_every = 1000
record_every = 500
# Keep track of losses for plotting
current_loss = 0
all_losses = []
predictions = []
true_vals = []
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
start = time.time()
for iter in range(1, n_iters + 1):
category, line, category_tensor, line_tensor = randomTrainingExample()
output, loss = train(category_tensor, line_tensor)
current_loss += loss
if iter % print_every == 0:
guess, guess_i = categoryFromOutput(output)
correct = 'O' if guess == category else 'X (%s)' % category
print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))
if iter % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0
if iter % record_every == 0:
guess, guess_i = categoryFromOutput(output)
predictions.append(guess)
true_vals.append(category)
add
或者不更改','
df['REF'], df['ALT1'], df['ALT2'] = zip(*df.AD.add(',').str.split(',').str[:3])
df
Chr Position Name AD REF ALT1 ALT2
0 1 866511 A 13,21 13 21
1 1 881627 A 28,33,31 28 33 31
2 2 1599812 B 67,25 67 25
答案 1 :(得分:1)
您可以将参数0 10 0 0
1 8 1 0
2 6 2 0
3 4 3 0
4 2 4 0
2 7 0 1
3 5 1 1
4 3 2 1
5 1 3 1
4 4 0 2
5 2 1 2
6 1 0 3
设置为expand
,然后执行以下操作:
True
我在带有df['REF'], df['ALT1'], df['ALT2'] = df.AD.str.split(',', 2, expand=True).values.T
的AD列中添加了一个包含3个值的行,您得到:
df.loc[3,:] = [3,5432,'C', '32,45,65']
答案 2 :(得分:0)
您可以执行rename
和concat
:
df = pd.concat((df, df['AD'].str.split(',', expand=True)
.rename(columns={0:'REF',1:'ALT1',2:'ALT2'})
), axis=1)
输出:
Chr Position Name AD REF ALT1
0 1 866511 A 13,21 13 21
1 1 881627 A 28,33 28 33
2 2 1599812 B 67,25 67 25