Question

我正在尝试根据类型拆分列。我想将数字与文本分开显示。

我尝试不添加循环而添加它，但是形状有所不同。因此，我求助于它。但是，这只是给我所有字段中的最后一个数字

Python输入：

newdf = pd.DataFrame()
newdf['name'] = ('leon','eurika','monica','wian')
newdf['surname'] = ('swart38','39swart','11swart','swart10')
a = newdf.shape[0]

newdf['age'] = ""
for i in range (0,a):
    newdf['age'] =  re.sub(r'\D', "",str(newdf.iloc[i,1]))

print (newdf)

我希望年龄列显示38,39,11,10。答案是所有"10"是最后一个字段。

出局：

     name  surname age
0    leon  swart38  10
1  eurika  swart39  10
2  monica  11swart  10
3    wian  swart10  10

Answer 1

这是因为您在newdf['age']循环的每次迭代中都向for分配了新值，其中最后一次分配是10。

您可以通过建立索引对其进行修复：

a = newdf.shape[0]
newdf['age'] = ""
for i in range (0,a):
    newdf['age'][i] =  re.sub(r'\D', "",str(newdf.iloc[i,1]))
    #           ^^^

或者使用pandas.Series.str.extract：

newdf['age'] = newdf['surname'].str.extract('(\d+)')
print(newdf)

输出：

     name  surname age
0    leon  swart38  38
1  eurika  39swart  39
2  monica  11swart  11
3    wian  swart10  10

Answer 2

尝试使用Series.str.replace：

newdf['age'] = newdf['surname'].str.replace(r'\D+', '')

将熊猫列拆分为文本和数字

2 个答案: