我有一个由制表符分隔的csv文件:
我只需要关注两个第一列并找到,例如,如果A-B对再次出现在文档中作为B-A,如果出现B-A则打印A-B。其他对的情况相同。
对于提出的示例,输出为: ·A-B &安培; C-d
dic ={}
import sys
import os
import pandas as pd
import numpy as np
import csv
colnames = ['col1', 'col2', 'col3', 'col4', 'col5']
data = pd.read_csv('koko.csv', names=colnames, delimiter='\t')
col1 = data.col1.tolist()
col2 = data.col2.tolist()
dataset = list(zip(col1,col2))
for a,b in dataset:
if (a,b) and (b,a) in dataset:
dic [a] = b
print (dic)
output = {'A': 'B', 'B': 'A', 'D': 'C', 'C':'D'}
如何避免字典中的重复(或交换)结果?
答案 0 :(得分:0)
这有效吗?:
import pandas as pd
import numpy as np
col_1 = ['A', 'B', 'C', 'B', 'D']
col_2 = ['B', 'C', 'D', 'A', 'C']
df = pd.DataFrame(np.column_stack([col_1,col_2]), columns = ['Col1', 'Col2'])
df['combined'] = list(zip(df['Col1'], df['Col2']))
final_set = set(tuple(sorted(t)) for t in df['combined'])
final_set看起来像这样:
{('C', 'D'), ('A', 'B'), ('B', 'C')}
由于第二行有B-C
,输出包含的不仅仅是A-B和C-D答案 1 :(得分:0)
以下内容应该有效,
示例df使用:
df = pd.DataFrame({'Col1' : ['A','C','D','B','D','A'], 'Col2' : ['B','D','C','A','C','B']})
这是我使用的功能:
temp = df[['Col1','Col2']].apply(lambda row: sorted(row), axis = 1)
print(temp[['Col1','Col2']].drop_duplicates())
有用的链接:
checking if a string is in alphabetical order in python
Difference between map, applymap and apply methods in Pandas
答案 2 :(得分:0)
这是一种方式。
df = pd.DataFrame({'Col1' : ['A','C','D','B','D','A','E'],
'Col2' : ['B','D','C','A','C','B','F']})
df = df.drop_duplicates()\
.apply(sorted, axis=1)\
.loc[df.duplicated(subset=['Col1', 'Col2'], keep=False)]\
.drop_duplicates()
# Col1 Col2
# 0 A B
# 1 C D
<强>解释强>
步骤如下: