class_name列包含课程名称和同期群组编号。 我想将列拆分为两列(名称,同类群号)
FROM:
| class_name |
| introduction to programming 1th |
| introduction to programming 2th |
| introduction to programming 3th |
| introduction to programming 4th |
| algorithms and data structure 1th |
| algorithms and data structure 2th |
| object-oriented programming |
| database systems |
(我知道它应该像第1,第2,第3,但字符串是我的语言,我们在数字后反复使用相同的字符。)
TO:
| class_name | class_cohort |
| introduction to programming | 1 |
| introduction to programming | 2 |
| introduction to programming | 3 |
| introduction to programming | 4 |
| algorithms and data structure | 1 |
| alrogithms and data structure | 2 |
| object-oriented programming | 1 |
| database systems | 1 |
以下是我一直在处理的代码:
import pandas as pd
course_count = 100
df = pd.read_csv("course.csv", nrows=course_count)
cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled']
df = df[cols_interest]
df.insert(1, 'class_cohort', 0)
# this is how I extract the numbers
df['class_name'].str.extract('(\d)').head()
# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's.
# once I figure that out, I plan to discard the last digits
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1])
我简要地检查了一个解决方案,我将在1号,2号,3号之前放置逗号,然后使用逗号作为分隔符拆分列,但我无法找到替换\ s1th的方法 - > ,所有数字的第1位。
答案 0 :(得分:1)
df['class_cohort'] = df['class_name'].str[-3:-2]
df['class_name'] = df['class_name'].str[:-4]
print df
class_name class_cohort
0 cs101 1
1 cs101 2
2 cs101 3
3 cs101 4
4 algorithms 1
5 algorithms 2
或使用str.extract
:
df['class_cohort'] = df['class_name'].str.extract('(\d)')
df['class_name'] = df['class_name'].str[:-4]
print df
class_name class_cohort
0 introduction to programming 1
1 introduction to programming 2
2 introduction to programming 3
3 introduction to programming 4
4 algorithms and data structure 1
5 algorithms and data structure 2