Question

我有一个功能：

def mini_distance(pace_data, activity_id):

    condition_count = 0
    false_count = 0
    true_count = 0
    list_of_classified_sessions = []

    print(activity_id)

    #Condition 1
    condition_count += 1
    #between 4500M and 15000M

    if(len(pace_data) in range(45,180)):
        print("1. Array length fits between X and Y: ",len(pace_data)*100,"M.")
        true_count+=1
    else:
        print("1. Array too short or too long: ",len(pace_data)*100,"M.")
        false_count+=1

    if(true_count == 1):
        list_of_classified_sessions.append(activity_id)

    print(list_of_classified_sessions)

我的任务：

检查数组是否包含一定数量的元素。如果为true，则将该数组的索引附加到列表中，如果不检查下一个数组。对位于DataFrame列中的许多数组执行此操作。

参数

array([  0.        ,   4.91101813,   5.58028665,   5.55841138,
     5.22151485,   5.30403077,   5.68089541,   4.6237671 ,
     5.52696382,   5.26733118])

index of 80

数组长度在X和Y之间吗？

如果是，则将1加到true_count变量中。否则，将1添加到false_count变量。

list_of_classified_sessions是存储这些索引值。

示例

mini_distance(example_array, example_index)

返回带有该数组在DataFrame中位置的索引位置的列表：

1. Array length fits between X and Y
[80]

如果数组在指定的元素之间不包含许多元素，则返回一个空列表：

1. Array length too short or too long.
[]

到目前为止已更正

我的尝试

代码

for i in range(0,5):
    mini_distance(df.iloc[i].column_with_arrays, df.iloc[i].index_of_same row)

输出

0 #this is the index
Array too short or too long.
[] #not added, correct
1
Array too short or too long.
[] #not added again, correct
2
Array fits between X and Y.
['2'] #add this index to the list
3
Array too short or too long.
[] #index 3 is not added but now my list is empty
4
Array fits between X and Y
['4'] #index 4 is added but where has index 2 gone?

所需的输出

包含所有数据帧行的索引的列表，其中数组在指定的元素之间包含许多元素：

output_list = [2,4,5,99,121,389,...,2112,3116]

我希望我已经说清楚了。如果需要，请进一步说明。

Answer 1

您的代码会生成5个单独的 list_of_classified_sessions ，每次调用函数时都会有一个条目，因为变量是在函数内部而不是在函数外部分配的。

要获得单个循环，请将函数外部的 list_of_classified_sessions 定义为循环的输出。

您可以通过更改功能来做到这一点，使其结构如下：

def mini_distance(pace_data, activity_id):

    condition_count = 0
    false_count = 0
    true_count = 0
    #list_of_classified_sessions = [] 
    #You don't need to create the list within the function

    print(activity_id)

    #Condition 1
    condition_count += 1
    #between 4500M and 15000M

    if(len(pace_data) in range(45,180)):
        print("1. Array length fits between X and Y: ",len(pace_data)*100,"M.")
        true_count+=1
    else:
        print("1. Array too short or too long: ",len(pace_data)*100,"M.")
        false_count+=1

    if(true_count == 1):
        return activity_id

然后设置循环以创建5个函数调用的输出列表。

list_of_outcomes = [mini_distance(df.iloc[i].column_with_arrays, df.iloc[i].index_of_same_row) for i in range(5)]

这将导致单个输出列表，其中列出了标记为“ true”的值。另外，列表理解总是很好的;）

Answer 2

添加了索引4，但是索引2哪里去了？

你好穆雷，

这实际上是一个声明范围问题。您在 mini_distance 函数中声明了 list_of_classified_sessions ，这意味着该函数范围之外的人无法访问，因此无法保存其值在两个函数调用之间：这说明了为什么每次调用函数时将数组初始化为 0 。

要实现所需的功能，只需在函数外部声明此数组并将其标记为 global 。

一个例子：

list_of_classified_sessions = [] # Global array declared 

def mini_distance(pace_data, activity_id):

    global list_of_classified_sessions # Now you can modify the global array inside this function
    condition_count = 0
    false_count = 0
    true_count = 0

    print(activity_id)

    #Condition 1
    condition_count += 1
    #between 4500M and 15000M

    if(len(pace_data) in range(45,180)):
        print("1. Array length fits between X and Y: ",len(pace_data)*100,"M.")
        true_count+=1
    else:
        print("1. Array too short or too long: ",len(pace_data)*100,"M.")
        false_count+=1

    if(true_count == 1):
        list_of_classified_sessions.append(activity_id)

    print(list_of_classified_sessions)

这应该得到您想要的。随时询问您是否还有其他问题。

J。史密斯

Python：在列表中附加在函数中评估为true的元素

2 个答案: