检查从一个 df 到另一个 df 中的另一列的一列中的字符串值

时间:2021-03-25 04:14:53

标签: python pandas dataframe

假设我有两只熊猫 df,它们看起来像这样:

data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]

first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_1, columns = ['Words', 'Numbers'])

有没有办法将第二个 DF 中的 Words 列与第一个 DF 中的 Word_set 列进行比较。理想情况下,任何匹配的值都会保存到新的 DF 中。

示例输出:

Output:

Column 1                                  Column 2
-----------                               ------------
'A big string of words', 'string of'      30
'Big string of words', 'Big swords'

2 个答案:

答案 0 :(得分:1)

这里的逻辑是在每个索引级别找到 matched string object,然后使用此命令 any(x in first_df['Word_set'][i] for x in j.split()) 将其连接以获得最终结果。
请查看此代码:

import pandas as pd

data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]

first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])

col1 = []
for i, j in zip(range(3),second_df['Words']):
    if any(x in first_df['Word_set'][i] for x in j.split()):
       col1.append(', '.join([first_df['Word_set'][i], j])) 
    col2 = list(first_df['Numbers'][first_df['Numbers'] == second_df['Numbers']])

df = pd.DataFrame(
    data= [col1, col2],
    index=['Column 1', 'Column 2']
).T

print(df)

输出:

                           Column 1 Column 2
0  A big string of words, string of       30
1   Big string of words, Big swords     None

答案 1 :(得分:1)

import pandas as pd

new_list = [];

data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]

first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])

for index_01, (row_data_set_1_position_01, row_data_set_1_position_02) in first_df.iterrows():
    #print(row_data_set_1_position_01)
    #print(row_data_set_1_position_02)

    for index_02, (row_data_set_2_position_01, row_data_set_2_position_02) in second_df.iterrows():

        #Words similar
        if row_data_set_1_position_01 == row_data_set_2_position_01:
            new_list.append([row_data_set_1_position_01, row_data_set_1_position_02,row_data_set_2_position_01,row_data_set_2_position_02,"Word"])

        #or similar code
        if row_data_set_1_position_02 == row_data_set_2_position_02:
            new_list.append([row_data_set_1_position_01, row_data_set_1_position_02,row_data_set_2_position_01,row_data_set_2_position_02,"Code"])

new_dataframe = pd.DataFrame(new_list, columns = ['Words', 'Numbers','Words', 'Numbers',"Similar"])
print(new_dataframe)