我有一个数据框:
Name Section
1 James P3
2 Sam 2.5C
3 Billy T35
4 Sarah A85
5 Felix 5I
如何将数字值拆分为称为Section_Number的单独列,并将字母值也拆分为Section_Letter。 期望的结果
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5L 5 L
答案 0 :(得分:4)
将[A-Z]+
的{{3}}与str.replace
一起用于所有大写字符串:
df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
对于Seelct来说,也是小写的值:
df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
答案 1 :(得分:1)
毫无疑问,它会比较慢,但是要想完整起见,您可以使用str.extractall
来获取与模式匹配的命名组,并合并匹配项并返回到DF ...
new = df.join(
df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
.groupby(level=0).first()
)
结果:
Name Section Section_Letter Section_Number
1 James P3 P 3
2 Sam 2.5C C 2.5
3 Billy T35 T 35
4 Sarah A85 A 85
5 Felix 5I I 5
答案 2 :(得分:1)
如果在您的示例中,每个名称中都有一个字母,则可以排序然后切成薄片:
def get_vals(x):
return ''.join(sorted(x, key=str.isalpha))
# apply ordering
vals = df['Section'].apply(get_vals)
# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]
print(df)
Name Section num letter
1 James P3 3.0 P
2 Sam 2.5C 2.5 C
3 Billy T35 35.0 T
4 Sarah A85 85.0 A
5 Felix 5I 5.0 I
答案 3 :(得分:0)
我们可以使用itertools.groupby
将连续的alpha和非alpha分组
from itertools import groupby
[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]
我们可以将其处理为新列
from itertools import groupby
N, L = zip(
*[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)
Name Section Selection_Number Selection_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I