假设我有两只熊猫 df
,它们看起来像这样:
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_1, columns = ['Words', 'Numbers'])
有没有办法将第二个 DF 中的 Words
列与第一个 DF 中的 Word_set
列进行比较。理想情况下,任何匹配的值都会保存到新的 DF 中。
示例输出:
Output:
Column 1 Column 2
----------- ------------
'A big string of words', 'string of' 30
'Big string of words', 'Big swords'
答案 0 :(得分:1)
这里的逻辑是在每个索引级别找到 matched string object
,然后使用此命令 any(x in first_df['Word_set'][i] for x in j.split())
将其连接以获得最终结果。
请查看此代码:
import pandas as pd
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])
col1 = []
for i, j in zip(range(3),second_df['Words']):
if any(x in first_df['Word_set'][i] for x in j.split()):
col1.append(', '.join([first_df['Word_set'][i], j]))
col2 = list(first_df['Numbers'][first_df['Numbers'] == second_df['Numbers']])
df = pd.DataFrame(
data= [col1, col2],
index=['Column 1', 'Column 2']
).T
print(df)
输出:
Column 1 Column 2
0 A big string of words, string of 30
1 Big string of words, Big swords None
答案 1 :(得分:1)
import pandas as pd
new_list = [];
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])
for index_01, (row_data_set_1_position_01, row_data_set_1_position_02) in first_df.iterrows():
#print(row_data_set_1_position_01)
#print(row_data_set_1_position_02)
for index_02, (row_data_set_2_position_01, row_data_set_2_position_02) in second_df.iterrows():
#Words similar
if row_data_set_1_position_01 == row_data_set_2_position_01:
new_list.append([row_data_set_1_position_01, row_data_set_1_position_02,row_data_set_2_position_01,row_data_set_2_position_02,"Word"])
#or similar code
if row_data_set_1_position_02 == row_data_set_2_position_02:
new_list.append([row_data_set_1_position_01, row_data_set_1_position_02,row_data_set_2_position_01,row_data_set_2_position_02,"Code"])
new_dataframe = pd.DataFrame(new_list, columns = ['Words', 'Numbers','Words', 'Numbers',"Similar"])
print(new_dataframe)