Question

我有一只熊猫df：

  name    time
1  a      1 year 2 months
2  b      4 years 1 month
3  c      3 years 1 month

我想结束：

  name    years   months
1  a      1       2
2  b      4       1
3  c      3       1

我可以达到：

  name    time
1  a      [1, 2]
2  b      [4, 1]
3  c      [3, 1]

但我无法弄清楚如何将列表拆分成列。

Answer 1

df = pd.DataFrame({'name': ['a', 'b', 'c'], 
                   'time': ['1 year 2 months', '4 years 1 month', '3 years 1 month']})

# Split the time column and take the first and third elements to extract the values.
df[['years', 'months']] = df.time.str.split(expand=True).iloc[:, [0, 2]].astype(int)

>>> df
   name             time  years months
0     a  1 year 2 months      1      2
1     b  4 years 1 month      4      1
2     c  3 years 1 month      3      1

当您准备放弃该列时，可以使用del df['time']。

Answer 2

您可以使用str.findall查找时间列中的数字，然后使用str.join和str.split获得结果：

In [240]: df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
Out[240]:
   0  1
0  1  2
1  4  1
2  3  1

df[['years', 'months']] = df.time.str.findall('\d').str.join('_').str.split('_', expand=True)

In [245]: df
Out[245]:
  name             time years months
0    a  1 year 2 months     1      2
1    b  4 years 1 month     4      1
2    c  3 years 1 month     3      1

比亚历山大的解决方案快一点，我认为更为一般。来自时间：

In [6]: %timeit df.time.str.split(expand=True).iloc[:, [0, 2]]
1000 loops, best of 3: 1.6 ms per loop

In [8]: %timeit df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
1000 loops, best of 3: 1.43 ms per loop

将列表拆分为列

2 个答案: