我正在学习数据科学,但是我的数据框架有问题,您能帮我吗?在我的dataFrame中,我有4列...“价格”,“位置”,“房屋有”,“描述”。在“价格”和“带有房子”中,我有几行与Nan或一无所有。我真的很想创建一个函数,在“说明”列中进行抓取,获取一个键(例如,一个单词,例如:$ 40或游泳池,花园),然后该键传输到“价格”或“房屋有”列。 例子
import pandas as pd
import numpy as np
Df2= {
'Price': ['90','NaN','NaN',' '],
'Location': ['NaN','Argentina','NaN','EEUU'],
'House with': ['Swimming pool', 'Garden','NaN', 'NaN'],
'Description': ['This house in Brazil cost $90 and have swimming pool', 'his house in Argentina cost $50 and have Garden','This house in Chile cost $70 and have Garden', 'This house in EEuu cost $80 and have swimming pool']}
df3 = pd.DataFrame(Df2)
df3
我希望它如下
Df2= {
'Price': ['90','50','70','80'],
'Location': ['Brazil','Argentina','Chile','EEUU'],
'House with': ['Swimming pool', 'Garden','Garden', 'swimming pool'],
'Description': ['This house in Brazil cost $90 and have swimming pool', 'his house in Argentina cost $50 and have Garden','This house in Chile cost $70 and have Garden', 'This house in EEuu cost $80 and have swimming pool']}
答案 0 :(得分:1)
您可以extract
个字符串分组。如果series具有以下字符串:
df['Price'] = df['Description'].str.extract(r'\$(\d+)')[0]
df['Location'] = df['Description'].str.extract(r'house in ([A-Za-z]+)')[0]
df['House with'] = df['Description'].str.extract(r'have ([A-Za-z]+)')[0]
df
Price Location House with Description
0 90 Brazil swimming This house in Brazil cost $90 and have swimming pool
1 50 Argentina Garden his house in Argentina cost $50 and have Garden
2 70 Chile Garden This house in Chile cost $70 and have Garden
3 80 EEuu swimming This house in EEuu cost $80 and have swimming pool
或
df['Price'] = df['Description'].str.extract(r'\$(\d+)',expand=False)
df['Location'] = df['Description'].str.extract(r'house in ([A-Za-z]+)',expand=False)
df['House with'] = df['Description'].str.extract(r'have ([A-Za-z]+)',,expand=False)