我从下面的数据框开始
0
0 Alabama[edit]
1 Auburn (Auburn University)[1]
2 Florence (University of North Alabama)
3 Jacksonville (Jacksonville State University)[2]
4 Livingston (University of West Alabama)[2]
然后我把它清理干净了
State RegionName
0 Alabama
1 Auburn
2 Florence
3 Jacksonville
4 Livingston
我不确定如何将奥本,佛罗伦萨,杰克逊维尔和利文斯顿搬到RegionName,因为它们是阿拉巴马州的地区。 另外,对于其余数据,我还需要将区域(500+)的移动应用于各自的状态(50个状态)。
下面是数据的映射方式(我在侧面的每一行的类型中添加了
State-->Alaska
Region->Fairbanks
State-->Arizona
Region->Flagstaff
Region->Tempe
Region->Tucson
预期答案:
State RegionName
0 Alabama Auburn
1 Alabama Florence
2 Alabama Jacksonville
3 Alabama Livingston
答案 0 :(得分:0)
这就是我要做的,从原始数据开始:
df['State'] = df[0].str.extract('(.*)\[edit\]').ffill()
df['RegionName'] = df[0].str.extract('(.*) \(')
df = df.dropna(subset=['RegionName'])
输出:
0 State RegionName
1 Auburn (Auburn University)[1] Alabama Auburn
2 Florence (University of North Alabama) Alabama Florence
3 Jacksonville (Jacksonville State University)[2] Alabama Jacksonville
4 Livingston (University of West Alabama)[2] Alabama Livingston