Pandas循环没有产生足够的行

时间:2017-06-16 15:21:47

标签: python loops pandas

我遇到迭代器无法迭代的问题。我试图在df2中查找df1的每个项目:查找行应该与'开始'的值相对应。在df1。然后我想返回匹配的列名。例如。对于df1 [2,0],它应该查找行' C'在df2中,返回' C',这是包含匹配值(5)的列。

DF1:

                 0        1        2        
0                1        3        6     
1                4        4        3     
2                5        6        2    
Start            C        A        B                

df2:

                 A        B        C               
 A               6        3        4           
 B               2        3        6    
 C               4        1        5     

到目前为止,我已经:

for i, row in df1.iterrows():
    for ii in range(0,len(df1.columns)): 
        col = df1.columns[ii]          
        result = pd.DataFrame(df2.loc[df1.loc['Start']].eq(col).idxmin(1)) 

这给了我一个系列(C,B,C),它只对df1的第0行进行匹配。理想的输出是3x3数据帧,对应于没有'开始'的df1。行:

                 0        1        2        
0                C        B        C     
1                A        C        B     
2                ...   

任何指针都非常感谢!

2 个答案:

答案 0 :(得分:0)

如果我正确理解了问题,则您提供的输出不正确。它应该是:

  0 1 2
0 B B C
1 A C B
2 C A A

我对大熊猫不是很流利但是能够使用一个版本..

def find_key_by_value(dic, value):
    for k, v in dic.items():
        if v == value:
            return k

data = {0:[], 1:[], 2: []}        
index = [0, 1, 2]       

for i, row in df1.iterrows():
    if i != 'Start': # Avoid calculating last line
        for ii in range(0,len(df1.columns)): 
            col = df1.columns[ii]

            to_match = row[ii] # number to match
            to_start = df1.loc['Start'][ii] # row under Start label

            # this is where my lack of pandas knowledge appears
            df2_row_keys = df2.loc[to_start].to_dict()
            result = find_key_by_value(df2_row_keys, to_match)
            data[ii].insert(i, result)

# data = {0: ['B', 'A', 'C'], 1: ['B', 'C', 'A'], 2: ['C', 'B', 'A']}
result = pd.DataFrame(data=data, index=index)

答案 1 :(得分:0)

我建议的方式是:

result = []
for y, row in df1.iterrows():
    if y == 'Start': # Skip the row named 'Start'
        continue
    result.append([]) # Make a new row in the result
    for x, item in row.iteritems():
        start = df1.loc['Start', x] # The same column, but in the start row
        search_row = df2.loc[start] # The row to look for a match in
        occurences = search_row.where(search_row == item)
        result[y].append(occurences.argmax()) # '.argmax' limits it to one occurence.
print(pd.DataFrame(result))

给出了输出:

   0  1  2
0  B  B  C
1  A  C  B
2  C  A  A