Python Pandas数据帧中字符串列表的部分匹配并返回所有匹配的部分字符串

时间:2020-07-10 05:00:08

标签: python pandas

大家好,我正在尝试在数据框中的列中匹配部分字符串并返回匹配字符串(大写字母很重要)。我对编程没有很深的了解,我只是开始学习。

    import os
    import pandas as pd
    import numpy as np
    import re
    state_abbrv = 
    ["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA",
     "ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
      OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"]
    
    
    
     d = {"Index": [1, 2, 3, 4, 5 , 6, 7], "Description": ["BROOKLYN NY", "M1ANY", 
          "NYNY","DO","nyNY", "CWARD NY", "HOWARD BEACH NY"]}
     df = pd.DataFrame(data=d)

    
    
    
    statesjoin='|'.join(state_abbrv)
    df=df.assign(State = df["Description"].apply(lambda x: 
    ','.join(re.findall('..',x))).str.findall(statesjoin))
    
    print(df)

当前结果-错误

 Index   Description     State
       1      BROOKLYN NY        []
       2            M1ANY        []
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [AR, NY]
       7  HOWARD BEACH NY      [WA]

正确的结果

   Index      Description     State
       1      BROOKLYN NY      [NY]
       2            M1ANY      [NY]
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [WA,AR,NY]
       7  HOWARD BEACH NY  [WA,AR,NY]

1 个答案:

答案 0 :(得分:0)

将列表理解与in语句的列表测试值一起使用:

df=df.assign(State = df["Description"].apply(lambda x: [y for y in state_abbrv if y in x]))
print (df)
   Index      Description         State
0      1      BROOKLYN NY      [NY, OK]
1      2            M1ANY          [NY]
2      3             NYNY          [NY]
3      4               DO            []
4      5             nyNY          [NY]
5      6         CWARD NY  [AR, NY, WA]
6      7  HOWARD BEACH NY  [AR, NY, WA]

由于您的解决方案未返回重叠的字符串,因此在这里AR

statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].str.findall(statesjoin))
print (df)
   Index      Description     State
0      1      BROOKLYN NY  [OK, NY]
1      2            M1ANY      [NY]
2      3             NYNY  [NY, NY]
3      4               DO        []
4      5             nyNY      [NY]
5      6         CWARD NY  [WA, NY]
6      7  HOWARD BEACH NY  [WA, NY]