Question

>e = {0: pd.Series(['NHL_toronto_maple-leafs_Canada', 'NHL_boston_bruins_US', 'NHL_detroit_red-wings', 'NHL_montreal'])}

>df = pd.DataFrame(e)

>df

    0
0   NHL_toronto_maple-leafs_Canada
1   NHL_boston_bruins_US
2   NHL_detroit_red-wings
3   NHL_montreal

我想：

1）将上述数据帧（系列）拆分为＆＃39; _＆＃39;

2）放弃NHL＆＃39;串

3）通过＆＃39; _＆＃39;

重新组合剩余的文本

4）将＃3中的结果作为第二列

附加到原始数据帧

为此，我尝试了以下方法：

>df2 = df.icol(0).str.split('_').apply(pd.Series).iloc[:,1:]

>df2

    1   2   3
0   toronto maple-leafs Canada
1   boston  bruins  US
2   detroit red-wings   NaN
3   montreal    NaN NaN

我尝试按照combine columns in Pandas中的建议执行以下操作：

>df2['4'] = df2.iloc[:,0] + "_" + df2.iloc[:,1] + "_" + df2.iloc[:,2]

>df2

    1   2   3   4
0   toronto maple-leafs Canada  toronto_maple-leafs_Canada
1   boston  bruins  US  boston_bruins_US
2   detroit red-wings   NaN NaN
3   montreal    NaN NaN NaN

但是，您可以看到，在组合涉及NaN的单元格的情况下，最终结果也是NaN。这不是我想要的。

第4列应如下所示：

toronto_maple-leafs_Canada
boston_bruins_US
detroit_red-wings_US
montreal

还有一种有效的方法来执行此类操作，因为我的实际数据集非常大。

将字符串列拆分为＆＃34; _＆＃34;，删除前面的文本，重新组合str by＆＃34; _＆＃34;在熊猫

0 个答案: