拆分成多列的值,这些值之间用逗号分隔

时间:2019-06-25 16:18:53

标签: python pandas split

我有以下数据集

Chr     Position       Name      AD                                 
1       866511          A       13,21
1       881627          A       28,33
2       1599812         B       67,25 

我需要将AD列分为三列[REF, ALT1, ALT2]。 当广告的每一行只有两个值时,我仍然需要用NaN值填充ALT2列。

如果AD包含具有三个值的行,则以下代码有效

df['REF'], df['ALT1'], df['ALT2'] = df['AD'].str.split(',', 2).str

但是,在某些情况下,对于每一行,数据集在列AD中仅包含两个值,当我运行同一行时,会收到以下错误消息:

ValueError: not enough values to unpack (expected 3, got 2)

在这种情况下,我仍然希望保留第三列ALT2并用NaN值填充。有什么建议吗?谢谢任何愿意提供帮助的人。

3 个答案:

答案 0 :(得分:2)

learning_rate = 0.005 criterion = nn.NLLLoss() def train(category_tensor, line_tensor): hidden = rnn.initHidden() rnn.zero_grad() for i in range(line_tensor.size()[0]): output, hidden = rnn(line_tensor[i], hidden) loss = criterion(output, category_tensor) loss.backward() # Add parameters' gradients to their values, multiplied by learning rate for p in rnn.parameters(): p.data.add_(-learning_rate, p.grad.data) return output, loss.item() 额外的n_iters = 100000 print_every = 5000 plot_every = 1000 record_every = 500 # Keep track of losses for plotting current_loss = 0 all_losses = [] predictions = [] true_vals = [] def timeSince(since): now = time.time() s = now - since m = math.floor(s / 60) s -= m * 60 return '%dm %ds' % (m, s) start = time.time() for iter in range(1, n_iters + 1): category, line, category_tensor, line_tensor = randomTrainingExample() output, loss = train(category_tensor, line_tensor) current_loss += loss if iter % print_every == 0: guess, guess_i = categoryFromOutput(output) correct = 'O' if guess == category else 'X (%s)' % category print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct)) if iter % plot_every == 0: all_losses.append(current_loss / plot_every) current_loss = 0 if iter % record_every == 0: guess, guess_i = categoryFromOutput(output) predictions.append(guess) true_vals.append(category)

add

或者不更改','

df['REF'], df['ALT1'], df['ALT2'] = zip(*df.AD.add(',').str.split(',').str[:3])

df

   Chr  Position Name        AD REF ALT1 ALT2
0    1    866511    A     13,21  13   21     
1    1    881627    A  28,33,31  28   33   31
2    2   1599812    B     67,25  67   25     

答案 1 :(得分:1)

您可以将参数0 10 0 0 1 8 1 0 2 6 2 0 3 4 3 0 4 2 4 0 2 7 0 1 3 5 1 1 4 3 2 1 5 1 3 1 4 4 0 2 5 2 1 2 6 1 0 3 设置为expand,然后执行以下操作:

True

我在带有df['REF'], df['ALT1'], df['ALT2'] = df.AD.str.split(',', 2, expand=True).values.T 的AD列中添加了一个包含3个值的行,您得到:

df.loc[3,:] = [3,5432,'C', '32,45,65']

答案 2 :(得分:0)

您可以执行renameconcat

df = pd.concat((df, df['AD'].str.split(',', expand=True)
                            .rename(columns={0:'REF',1:'ALT1',2:'ALT2'})
               ), axis=1)

输出:

   Chr  Position Name     AD REF ALT1
0    1    866511    A  13,21  13   21
1    1    881627    A  28,33  28   33
2    2   1599812    B  67,25  67   25