我有一个excel文件,并且已处理该文件以进行数据分析并创建了for & if
现在我需要得到结果,
我试图通过使用Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization
条件遍历熊猫列和行来获取它,但是我没有得到想要的输出。
我在Excel文件中使用了连字符(-),以便可以应用一些条件。
Excel文件 Input_File
df = pd.read_excel('Test.xlsx')
df.fillna('-')
# Below code answer Z -> X
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['End_Name'] != '-':
print(row['Start_Name'] +' -> '+ row['End_Name'])
# Below code answer A -> B / F -> G / H -> J / C1 -> A1
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['Mid_Name_1'] == '-':
if row['Mid_Name_2'] != '-':
print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])
# Below code answer B -> C / C -> E
for index, row in df.iterrows():
if row['Mid_Name_1'] != '-':
if row['Mid_Name_2'] != '-':
print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])
代码
{{1}}
答案 0 :(得分:1)
设置:
Fronts
词典包含以名称/关键字开头的序列的值/位置。
Backs
词典保存以名称/关键字结尾的序列的值/位置。
sequences
是包含所有合并关系的列表。
position_counter
存储最后创建的序列的位置。
from collections import deque
import pandas as pd
data = pd.read_csv("Names_relations.csv")
fronts = dict()
backs = dict()
sequences = []
position_counter = 0
Extract_all。为每个row
选择与正则表达式模式匹配的值
selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)
对于relation
中的每个selector
,请提取元素。
将它们放入队列。
检查新front
中的relation
是否可以附加到任何先前的序列上。
如果是这样:
position
。llist2
llist2
删除最后一个重复的元素sequences
backs
fronts
和backs
中删除上一个序列的突出末端类似于back in fronts.keys():
如果尚不存在与新关系匹配的序列:
fronts
和backs
for relation in selector:
front, back = relation[0]
llist = deque((front, back))
finb = front in backs.keys()
# binf = back in fronts.keys()
if finb:
position = backs[front]
llist2 = sequences[position]
back_llist2 = llist2.pop()
llist = llist2 + llist
sequences[position] = llist
backs[llist[-1]] = position
if front in fronts.keys():
del fronts[front]
if back_llist2 in backs.keys():
del backs[back_llist2]
# if binf:
# position = fronts[back]
# llist2 = sequences[position]
# front_llist2 = llist2.popleft()
# llist = llist + llist2
# sequences[position] = llist
# fronts[llist[0]] = position
# if back in backs.keys():
# del backs[back]
# if front_llist2 in fronts.keys():
# del fronts[front_llist2]
# if not (finb or binf):
if not finb: #(equivalent to 'else:')
sequences.append(llist)
fronts[front] = position_counter
backs[back] = position_counter
position_counter += 1
for s in sequences:
print(' -> '.join(str(el) for el in s))
输出:
A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X
Name_relations.csv
Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X