Question

我正在尝试遍历数据框中的列，并使用循环列中的子字符串创建一个新列（如果它包含字典中的值）。更具体地说，如果地址列中的单个行包含州名称和缩写词典中的状态，则将州缩写附加到将成为新列的列表。

以下代码适用于完整匹配，但不扫描子行的行：

import pandas as pd

df = pd.DataFrame((['Austin, Texas',
               'Texas',
               'Seattle, Washington',
               ',,, Texas',
               'Olympia, WA']), columns = ['Place'])

states = {'Texas': 'TX',
      'Washington': 'WA'}

place = df['Place']

results = []

for x in place:
    if x in states:
        results.append(x)
    else:
        results.append(None)

df['State'] = results
df

谢谢！

Answer 1

嵌套的条件列表理解将起到作用。您需要拆分逗号并使用条带删除空格。

此外，纽约，纽约（市，州）可能会导致问题，因此我将结果留在列表中。

df['results'] = [[state.strip() for state in cell.split(',') 
                  if state.strip() in states] 
                 for cell in df.Place]

df['results2'] = df.results.apply(lambda s: s[-1] if s else '')

>>> df
                 Place       results    results2
0        Austin, Texas       [Texas]       Texas
1                Texas       [Texas]       Texas
2  Seattle, Washington  [Washington]  Washington
3            ,,, Texas       [Texas]       Texas
4          Olympia, WA            []

循环遍历数据框以提取匹配字典的子字符串

1 个答案: