我们假设我有这样的DataFrame:
np.linalg.norm
我想使用Type Vector distance1 distance2
A [0.2340, 0.5463, 0.5652, 0.3243, 0.3243] A-B A-C
distance3
A-D
来计算这些向量的距离。我想得到的是
df['vector'] = df['vector'].apply(lambda x: np.array(x))
print(type(df['vector'].iloc[0]))
作为新列。 编辑:我也这样做了:
<class 'numpy.ndarray'>
结果是:
print(np.linalg.norm(df['vector'].iloc[0] -df['vector'].iloc[1]))
当我简单地说:
ValueError: Wrong number of items passed 544, placement implies 1
我得到一个浮动值
但是我迭代了我得到的行:
{{1}}
我怎么能解决它? 注意:矢量确实长544个字符
答案 0 :(得分:1)
如果您正在使用pickle,请使用pandas pickle-import:
header('Cache-Control: private, no-cache');
$thisDomain="https://podnews.net"; // The main production domain
$devDomain="http://dev.podnews.net"; // The development domain
$googleAMPCacheSubdomain=str_replace(".","-",str_replace("-","--",$thisDomain));
//If you use an IDN, you've got more work to do in the above to work out your AMP cache subdomain
//https://github.com/ampproject/amphtml/blob/master/spec/amp-cors-requests.md has details
$validOrigins=array('https://'.$googleAMPCacheSubdomain.'.cdn.ampproject.org','https://cdn.ampproject.org','https://amp.cloudflare.com',$thisDomain,$devDomain);
if (!in_array($_SERVER['HTTP_ORIGIN'],$validOrigins)) {
header('X-Debug: '.$_SERVER['HTTP_ORIGIN'].' is an unrecognised origin');
header('HTTP/1.0 403 Forbidden');exit;
//Stop doing anything if this is an unfamiliar origin
}
if ($_GET['__amp_source_origin']!=$thisDomain AND $_GET['__amp_source_origin']!=$devDomain) {
header('X-Debug: '.$_GET['__amp_source_origin'].' is an unrecognised source origin');
header('HTTP/1.0 403 Forbidden');exit;
//Stop doing anything if this is an unfamiliar source origin
//Note: if using Amazon Cloudfront, don't forget to allow query strings through
}
header('Access-Control-Allow-Origin: '.$_SERVER['HTTP_ORIGIN']);
header('Access-Control-Allow-Credentials: true');
header('Access-Control-Expose-Headers: AMP-Access-Control-Allow-Source-Origin');
header('AMP-Access-Control-Allow-Source-Origin: '.urldecode($_GET['__amp_source_origin']));
header('Content-Type: application/json');
// You're in!
由于pandas是在numpy上构建的,现在你可以将你想要的列作为numpy数组:
import pandas as pd
df = pd.read_pickle('your_file_name')
请注意你的矢量 - 它们的大小不一样!例如,C和D的长度为6.我假设您的逗号是第一个值的点。
编辑:
一个完整的例子是:
import numpy as np:
np.linalg.norm(x = df['your column'])
编辑2(与我的评论相关):
我建议您使用所需的值生成列表或字典,因为将所有内容附加到表可能会导致表格非常大。 然后代码如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A':[0.2340, 0.5463, 0.5652, 0.3243, 0.3243],
'B':[0.3244, 0.5566, 0.2344, 0.1213, 0.9821],
'C':[0.5652, 0.3453, 0.3454, 0.5656, 0.6766],
'D':[0.5125, 0.3345, 0.1112, 0.4545, 0.6324]
})
df_distances = df.transpose() #Transpose columns to rows
for col in df:
for col2 in df:
df_distances["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])