Question

class_name列包含课程名称和同期群组编号。我想将列拆分为两列（名称，同类群号）

FROM：

| class_name |

| introduction to programming 1th |
| introduction to programming 2th |
| introduction to programming 3th |
| introduction to programming 4th |
| algorithms and data structure 1th |
| algorithms and data structure 2th |
| object-oriented programming |
| database systems |

（我知道它应该像第1，第2，第3，但字符串是我的语言，我们在数字后反复使用相同的字符。）

TO：

| class_name | class_cohort |

| introduction to programming | 1 |
| introduction to programming | 2 |
| introduction to programming | 3 |
| introduction to programming | 4 |
| algorithms and data structure | 1 |
| alrogithms and data structure | 2 |
| object-oriented programming | 1 |
| database systems | 1 |

以下是我一直在处理的代码：

import pandas as pd

course_count = 100
df = pd.read_csv("course.csv", nrows=course_count)

cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled']

df = df[cols_interest]
df.insert(1, 'class_cohort', 0)

# this is how I extract the numbers
df['class_name'].str.extract('(\d)').head()

# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's.

# once I figure that out, I plan to discard the last digits
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1])

我简要地检查了一个解决方案，我将在1号，2号，3号之前放置逗号，然后使用逗号作为分隔符拆分列，但我无法找到替换\ s1th的方法 - ＆gt; ，所有数字的第1位。

Answer 1

你可以indexing by positions：

df['class_cohort'] = df['class_name'].str[-3:-2]
df['class_name'] = df['class_name'].str[:-4]
print df
   class_name class_cohort
0       cs101            1
1       cs101            2
2       cs101            3
3       cs101            4
4  algorithms            1
5  algorithms            2

或使用str.extract：

df['class_cohort'] = df['class_name'].str.extract('(\d)')
df['class_name'] = df['class_name'].str[:-4]
print df
                      class_name class_cohort
0    introduction to programming            1
1    introduction to programming            2
2    introduction to programming            3
3    introduction to programming            4
4  algorithms and data structure            1
5  algorithms and data structure            2

使用pandas拆分列并使用提取的值填充另一列

1 个答案: