如何从字符串中提取多个int并附加到Pandas中的dataframe?

时间:2016-02-18 18:28:43

标签: python dataframe

我的数据框df看起来像

Pairing        Result
1001_1234_1235 1
1001_1233_1236 0
...

我想为Pairing列中的每一行提取最后2个整数,并将它们放入新列中。也就是说,我希望df现在看起来像

Pairing        Result  First Second
1001_1234_1235 1       1234  1235
1001_1233_1236 0       1233  1236
...

任何人都知道怎么做?

3 个答案:

答案 0 :(得分:2)

您可以使用pandas str操作轻松完成此操作:

import pandas as pd

df = pd.DataFrame({
    'Pairing': ['1001_1234_1235', '1001_1233_1236'],
    'Result': [1, 0],
})

# split at '_', each result will become a new column
df2 = df['Pairing'].str.split('_', expand=True)

# convert to numbers
df2 = df2.astype(int)

#rename columns back to something useful
df2.columns = ['Pairing{}'.format(col) for col in df2.columns ]

# add the columns back to the old DataFrame
df = df.join(df2)

这导致:

          Pairing  Result  Pairing0  Pairing1  Pairing2
0  1001_1234_1235       1      1001      1234      1235
1  1001_1233_1236       0      1001      1233      1236

有关更多示例,请参阅Pandas - 使用文本数据:

http://pandas.pydata.org/pandas-docs/stable/text.html

答案 1 :(得分:0)

如果你有@Override public Filter getFilter () { return MyViewClass.this.mFilter; } 那么

pairing = '1001_1234_1235'

答案 2 :(得分:0)

import pandas as pd
import numpy as np

# assuming you have defined other columns in df here

# Create empty columns for the new int columns
df['First'] = np.NaN
df['Second'] = np.NaN

# For each element in Pairing
for i, pairing in enumerate(df['Pairing']):
    # split pairing into list based on underscores, get last two ints only
    ints = [int(x) for x in pairing.split('_')[-2:]]
    df['First'][i] = ints[0]
    df['Second'][i] = ints[1]

print(df)

df应如下所示:

Pairing          Result  First  Second
1001_1234_1235   1       1234   1235
1001_1233_1236   0       1233   1236
...