在熊猫的两列中找到关系值

时间:2019-06-06 12:43:14

标签: python pandas

我正在尝试根据熊猫中的另一列提取一列中的值, 例如,假设我在数据框中有2列,如下所示

>>> check
  child parent
0     b      a
1     c      a
2     d      b
3     e      d

现在我要提取“子”列中的所有值作为“父”列中的值 现在,我的初始值可能会有所不同,假设它在“父”列中为“ a”

数据帧的长度也可能不同。

我在下面尝试过,但是如果再有几个匹配值并且数据帧的长度更多,则无法正常工作

check = pd.read_csv("Book2.csv",encoding='cp1252')


new = (check.loc[check['parent'] == 'a', 'child']).tolist()
len(new)

a=[]
a.append(new)

for i in range(len(new)):
    new[i]
    new1 = (check.loc[check['parent'] == new[i], 'child']).tolist()
    len(new1)
    if(len(new1)>0):
        a.append(new1)
        for i in range(len(new1)):
            new2 = (check.loc[check['parent'] == new1[i], 'child']).tolist()
            if(len(new1)>0):
                a.append(new2)

flat_list = [item for sublist in a for item in sublist]

>>> flat_list
['b', 'c', 'd', 'e']

有什么有效的方法来获得理想的结果,这将是一个很大的帮助。请指教

3 个答案:

答案 0 :(得分:2)

递归是一种实现方法。假设check是您的数据框,请定义一个递归函数:

final = [] #empty list which is used to store all results

def getchilds(df, res, value):
    where = df['parent'].isin([value]) #check rows where parent is equal to value
    newvals = list(df['child'].loc[where]) #get the corresponding child values
    if len(newvals) > 0:
        res.extend(newvals)
        for i in newvals: #recursive calls using child values
            getchilds(df, res, i)

getchilds(check, final, 'a')
print(final)

print(final)打印['b', 'c', 'd', 'e'],如果您的示例是check

如果您没有循环调用,例如'b''a'的子项,而'a''b'的子项,则此方法有效。在这种情况下,您需要添加更多检查以防止无限递归。

答案 1 :(得分:0)

for ($i = 3; $i < $num/2; $i += 2)

然后调用for ($i = 3; $i*$i <= $num; $i += 2) 打印:

out_dict = {}
for v in pd.unique(check['parent']):
    out_dict[v] = list(pd.unique(check['child'][check['parent']==v]))

答案 2 :(得分:0)

让我猜测一下,说您想获取父级值为 x

的列子级的所有值
import pandas as pd

def get_x_values_of_y(comparison_val, df, val_type="get_parent"):
   val_to_be_found = ["child","parent"][val_type=="get_parent"]
   val_existing = ["child","parent"][val_type != "get_parent"]
   mask_value = df[val_existing] == "a"
   to_be_found_column = df[mask_value][val_to_be_found]
   unique_results = to_be_found_column.unique().tolist()
   return unique_results

check = pd.read_csv("Book2.csv",encoding='cp1252')
# to get results of all parents of child "a"
print get_x_values_of_y("a", check)

# to get results of all children of parent "b"
print get_x_values_of_y("b", check, val_type="get_child")

# to get results of all parents of every child
list_of_all_children = check["child"].unique().tolist()
for each_child in list_of_all_children:
    print get_x_values_of_y(each_child, check)

# to get results of all children of every parent
list_of_all_parents = check["parent"].unique().tolist()
for each_parent in list_of_all_parents:
    print get_x_values_of_y(each_parent, check, val_type= "get_child")

希望这可以解决您的问题。