如何编写嵌套的db.collection.aggregate([
{
$project: {
id: 1,
types: {
$map: {
input: "$types",
as: "type",
in: {
type: "$$type",
applications: {
$filter: {
input: "$applications",
as: "application",
cond: {
$allElementsTrue: {
$map: { input: "$$application", in: { $eq: [ "$$this", "$$type" ] } }
}
}
}
}
}
}
}
}
},
{
$addFields: {
types: {
$map: {
input: "$types",
in: {
$mergeObjects: [
"$$this",
{
count: {
$reduce: {
input: "$$this.applications",
initialValue: 0,
in: { $add: [ "$$value", { $size: "$$this" } ] }
}
}
}
]
}
}
}
}
}
])
循环以访问for
中一行的其他每一行?
我正在尝试在pandas.dataframe中的行之间执行一些操作
我的示例代码的操作是计算每一行与另一行之间的欧几里得距离。
然后将结果保存到表单中的某个列表中
pandas.dataframe
。
我了解如何使用[(row_reference, name, dist)]
访问pandas.dataframe中的每一行,但是我不确定如何相对于当前行访问其他每一行以执行行间操作。 / p>
df.itterrows()
我希望对当前行/索引的所有行执行一些操作import pandas as pd
import numpy
import math
df = pd.DataFrame([{'name': "Bill", 'c1': 3, 'c2': 8}, {'name': "James", 'c1': 4, 'c2': 12},
{'name': "John", 'c1': 12, 'c2': 26}])
#Euclidean distance function where x1=c1_row1 ,x2=c1_row2, y1=c2_row1, #y2=c2_row2
def edist(x1, x2, y1, y2):
dist = math.sqrt(math.pow((x1 - x2),2) + math.pow((y1 - y2),2))
return dist
# Calculate Euclidean distance for one row (e.g. Bill) against each other row
# (e.g. "James" and "John"). Save results to a list (N_name, dist).
all_results = []
for index, row in df.iterrows():
results = []
# secondary loop to look for OTHER rows with respect to the current row
# results.append(row2['name'],edist())
all_results.append(row,results)
。
我希望循环执行以下操作:
edist()
具有以下预期结果输出:
In[1]:
result = []
result.append(['James',edist(3,4,8,12)])
result.append(['John',edist(3,12,8,26)])
results_all=[]
results_all.append([0,result])
result2 = []
result2.append(['John',edist(4,12,12,26)])
result2.append(['Bill',edist(4,3,12,8)])
results_all.append([1,result2])
result3 = []
result3.append(['Bill',edist(12,3,26,8)])
result3.append(['James', edist(12,4,26,12)])
results_all.append([2,result3])
results_all
答案 0 :(得分:1)
如果数据不太长,可以检出scipy的distance_matrix
:
all_results = pd.DataFrame(distance_matrix(df[['c1','c2']],df[['c1','c2']]),
index=df['name'],
columns=df['name'])
输出:
name Bill James John
name
Bill 0.000000 4.123106 20.124612
James 4.123106 0.000000 16.124515
John 20.124612 16.124515 0.000000
答案 1 :(得分:0)
考虑shift
并避免任何行循环。而且,因为您运行的是简单的算术运算,所以可以借助numpy
进行矢量化计算,直接在列上运行表达式。
import numpy as np
df = (df.assign(c1_shift = lambda x: x['c1'].shift(1),
c2_shift = lambda x: x['c2'].shift(1))
)
df['dist'] = np.sqrt(np.power(df['c1'] - df['c1_shift'], 2) +
np.power(df['c2'] - df['c2_shift'], 2))
print(df)
# name c1 c2 c1_shift c2_shift dist
# 0 Bill 3 8 NaN NaN NaN
# 1 James 4 12 3.0 8.0 4.123106
# 2 John 12 26 4.0 12.0 16.124515
是否希望每个行彼此组合,考虑自身的交叉连接并查询反向重复项:
df = (pd.merge(df.assign(key=1), df.assign(key=1), on="key")
.query("name_x < name_y")
.drop(columns=['key'])
)
df['dist'] = np.sqrt(np.power(df['c1_x'] - df['c1_y'], 2) +
np.power(df['c2_x'] - df['c2_y'], 2))
print(df)
# name_x c1_x c2_x name_y c1_y c2_y dist
# 1 Bill 3 8 James 4 12 4.123106
# 2 Bill 3 8 John 12 26 20.124612
# 5 James 4 12 John 12 26 16.124515