如何使我需要的一部分?

时间:2019-01-06 14:03:15

标签: python regex pandas

我只需要从表列中提取一行的一部分-可以是0到4个字符长:

  

“地址”:“ 124”

我知道这可以作为“提取” / findall函数来完成。但事实证明,这只是设置一个遮罩,在该遮罩上,只有属于该遮罩的线的一部分才会开始战斗。就像我说的,代码长度不同,所以这种方法无效。 请告诉我如何正确设置选择的遮罩。

表列中的示例行:

  

{'latitude':'37 .80505999961946','human_address':   '{“地址”:“ 0”,“城市”:“奥克兰”,“州”:“ Ca”,“邮编”:“”}“,   'needs_recoding':False,'longitude':'-122.27301999967312'}

df['latitude_1'] = df['Location 1'].str.extract('(\"\d\d\d\d)', expand=True)

2 个答案:

答案 0 :(得分:0)

我希望这对您有帮助

dic = {'latitude': '37.80505999961946', 'human_address': '{"address":"1234","city":"Oakland","state":"Ca","zip":""}', 'needs_recoding': False, 'longitude': '-122.27301999967312'}, {'latitude': '37.80505999961946', 'human_address': '{"address":"0","city":"Oakland","state":"Ca","zip":""}', 'needs_recoding': False, 'longitude': '-122.27301999967312'}
df = pd.DataFrame(list(dic))
df


          human_address                                   latitude             longitude        needs_recoding
0   {"address":"1234","city":"Oakland","state":"Ca...   37.80505999961946   -122.27301999967312 False
1   {"address":"0","city":"Oakland","state":"Ca","...   37.80505999961946   -122.27301999967312 False


import re
df.human_address.apply(lambda s: re.search('\"address\"*:*\"\d{0,4}\"', s).group())


0    "address":"1234"
1       "address":"0"
Name: human_address, dtype: object

答案 1 :(得分:0)

您确实可以使用pandas str.extract,您只需要调整正则表达式模式即可。

下面是来自@Anana Mital的数据框。

>>> df
                                       human_address           latitude            longitude  needs_recoding
0  {"address":"1234","city":"Oakland","state":"Ca...  37.80505999961946  -122.27301999967312           False
1  {"address":"0","city":"Oakland","state":"Ca","...  37.80505999961946  -122.27301999967312           False

这是使用str.extract获得结果的方法:

>>> df.human_address.str.extract('(\"address\":\"\d{0,4}\")')
                  0
0  "address":"1234"
1     "address":"0"

OR,如下所示。.

>>> df.human_address.str.extract(r'("address":"\d{0,4}")')
                  0
0  "address":"1234"
1     "address":"0"