Question

我目前正在将dict键映射到单独列中基于列的值。我正在使用Code中的值来匹配字典中的值，并将键复制到单独的列中。因此，新列将包含数字1,2,3。

除了在同一时间戳上的多个代码值外，此方法工作正常。我只希望每个唯一时间戳都有一个映射值。

如果在同一时间戳上有多个值，但是映射的数字将是相同的(A,B)，则只需取第一个值。我可以为此使用.drop_duplicates。

但是，如果在同一时间点映射的号码不同，我想删除2并选择1。 .drop_duplicates仅在1之前在2之前有效，反之亦然

import pandas as pd
from fuzzywuzzy import process

df = pd.DataFrame({   
        'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
        'Code' : ['A','C','X','Y','A','B','X','A','Z','L'],                                   
        })

# Dictionary that contains how to map numbers
hdict = {'1' : ['A', 'B'],

    '2' : ['X','Y','Z'],

    '3' : ['D']}

def hColumn(df):

    # Construct a dataframe from the helper dictionary
    df1 = pd.DataFrame([*hdict.values()], index = hdict.keys()).T.melt().dropna()   
    # Get relevant matches using the library.
    m = df['Code'].apply(lambda x: process.extract(x, df1.value)[0])
    # Concat the matches with original df
    df2 = pd.concat([df, m[m.apply(lambda x: x[1]>80)].apply(lambda x: x[0])], axis=1)
    df2.columns = [*df.columns, 'matches']
    # After merge it with df1
    df2 = df2.merge(df1, left_on='matches', right_on='value', how='left')
    # Drop columns that are not required and rename.
    df2 = df2.drop(['matches','value'],1).rename(columns={'variable':'H'})
    # Drop unwanted rows
    df2 = df2.mask(df2['H'].isna())
    df2 = df2.dropna(subset = ['H'])

    return df2

df = hColumn(df)

预期输出：

                    Time Code  H
0  2019-08-02 09:50:10.1    A  1
1  2019-08-02 09:50:10.2    X  2
2  2019-08-02 09:50:10.3    Y  2
3  2019-08-02 09:50:10.4    A  1
4  2019-08-02 09:50:10.5    A  1
5  2019-08-02 09:50:10.6    Z  2

Answer 1

如果我是我，我将“撤消”您的字典，它将非常简化所有后续操作，我的解决方法如下，请告诉我它是否有帮助，并随时提出问题：

rawValue

Answer 2

使用DataFrame.drop_duplicates：

df = df.drop_duplicates('Time')

如果可能，这是另一种解决方案-它通过Series.map匹配值：

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in hdict.items() for k in oldv}
df["H"] = df['Code'].map(d)
df = df.dropna(subset=['H']).drop_duplicates('Time')
print (df)
                    Time Code  H
0  2019-08-02 09:50:10.1    A  1
2  2019-08-02 09:50:10.2    X  2
3  2019-08-02 09:50:10.3    Y  2
4  2019-08-02 09:50:10.4    A  1
6  2019-08-02 09:50:10.5    X  2
8  2019-08-02 09:50:10.6    Z  2

Answer 3

from flashtext import KeywordProcessor
kp = KeywordProcessor()
kp.add_keywords_from_dict(hdict)

df['H'] = df['Code'].apply(lambda x : kp.extract_keywords(x))

df['H'] = df['H'].apply(lambda x: pd.Series(x[0]) if x else pd.Series())
df.dropna(inplace = True)
df

将字典键映射到熊猫df

3 个答案: