DataFrame是否可以在同一数据帧中抓取/查找密钥?

时间:2018-09-01 13:16:35

标签: python database pandas numpy dataframe

我正在学习数据科学,但是我的数据框架有问题,您能帮我吗?在我的dataFrame中,我有4列...“价格”,“位置”,“房屋有”,“描述”。在“价格”和“带有房子”中,我有几行与Nan或一无所有。我真的很想创建一个函数,在“说明”列中进行抓取,获取一个键(例如,一个单词,例如:$ 40或游泳池,花园),然后该键传输到“价格”或“房屋有”列。 例子

import pandas as pd
import numpy as np
Df2= {
    'Price': ['90','NaN','NaN',' '],
    'Location': ['NaN','Argentina','NaN','EEUU'],
    'House with': ['Swimming pool', 'Garden','NaN', 'NaN'],
    'Description': ['This house in Brazil cost $90 and       have swimming pool', 'his house in Argentina cost $50 and        have Garden','This house in Chile cost $70 and have Garden', 'This house in EEuu cost $80 and        have swimming pool']}

df3 = pd.DataFrame(Df2)
df3

我希望它如下

Df2= {
        'Price': ['90','50','70','80'],
        'Location': ['Brazil','Argentina','Chile','EEUU'],
        'House with': ['Swimming pool', 'Garden','Garden', 'swimming pool'],
        'Description': ['This house in Brazil cost $90 and       have swimming pool', 'his house in Argentina cost $50 and        have Garden','This house in Chile cost $70 and have Garden', 'This house in EEuu cost $80 and        have swimming pool']}

1 个答案:

答案 0 :(得分:1)

您可以extract个字符串分组。如果series具有以下字符串:

df['Price'] = df['Description'].str.extract(r'\$(\d+)')[0]
df['Location'] = df['Description'].str.extract(r'house in ([A-Za-z]+)')[0]
df['House with'] = df['Description'].str.extract(r'have ([A-Za-z]+)')[0]
df

 Price  Location    House with  Description
0   90    Brazil    swimming    This house in Brazil cost $90 and       have swimming pool
1   50 Argentina    Garden      his house in Argentina cost $50 and        have Garden
2   70     Chile    Garden      This house in Chile cost $70 and have Garden
3   80      EEuu    swimming    This house in EEuu cost $80 and        have swimming pool

df['Price'] = df['Description'].str.extract(r'\$(\d+)',expand=False)
df['Location'] = df['Description'].str.extract(r'house in ([A-Za-z]+)',expand=False)
df['House with'] = df['Description'].str.extract(r'have ([A-Za-z]+)',,expand=False)