将列转换为numpy数组

时间:2017-11-21 11:26:05

标签: python numpy

我们假设我有这样的DataFrame:

np.linalg.norm

我想使用Type Vector distance1 distance2 A [0.2340, 0.5463, 0.5652, 0.3243, 0.3243] A-B A-C distance3 A-D 来计算这些向量的距离。我想得到的是

df['vector'] = df['vector'].apply(lambda x: np.array(x)) 
print(type(df['vector'].iloc[0]))

作为新列。 编辑:我也这样做了:

<class 'numpy.ndarray'>

结果是:

print(np.linalg.norm(df['vector'].iloc[0] -df['vector'].iloc[1]))

当我简单地说:

ValueError: Wrong number of items passed 544, placement implies 1

我得到一个浮动值

但是我迭代了我得到的行:

{{1}}

我怎么能解决它? 注意:矢量确实长544个字符

1 个答案:

答案 0 :(得分:1)

如果您正在使用pickle,请使用pandas pickle-import:

header('Cache-Control: private, no-cache');

$thisDomain="https://podnews.net"; // The main production domain
$devDomain="http://dev.podnews.net"; // The development domain

$googleAMPCacheSubdomain=str_replace(".","-",str_replace("-","--",$thisDomain));

//If you use an IDN, you've got more work to do in the above to work out your AMP cache subdomain
//https://github.com/ampproject/amphtml/blob/master/spec/amp-cors-requests.md has details

$validOrigins=array('https://'.$googleAMPCacheSubdomain.'.cdn.ampproject.org','https://cdn.ampproject.org','https://amp.cloudflare.com',$thisDomain,$devDomain);

if (!in_array($_SERVER['HTTP_ORIGIN'],$validOrigins)) {
    header('X-Debug: '.$_SERVER['HTTP_ORIGIN'].' is an unrecognised origin');
    header('HTTP/1.0 403 Forbidden');exit;

    //Stop doing anything if this is an unfamiliar origin
}

if ($_GET['__amp_source_origin']!=$thisDomain AND $_GET['__amp_source_origin']!=$devDomain) {
    header('X-Debug: '.$_GET['__amp_source_origin'].' is an unrecognised source origin');
    header('HTTP/1.0 403 Forbidden');exit;

    //Stop doing anything if this is an unfamiliar source origin
    //Note: if using Amazon Cloudfront, don't forget to allow query strings through
}

header('Access-Control-Allow-Origin: '.$_SERVER['HTTP_ORIGIN']);
header('Access-Control-Allow-Credentials: true');
header('Access-Control-Expose-Headers: AMP-Access-Control-Allow-Source-Origin');
header('AMP-Access-Control-Allow-Source-Origin: '.urldecode($_GET['__amp_source_origin']));
header('Content-Type: application/json');
// You're in!

由于pandas是在numpy上构建的,现在你可以将你想要的列作为numpy数组:

import pandas as pd

df = pd.read_pickle('your_file_name')

请注意你的矢量 - 它们的大小不一样!例如,C和D的长度为6.我假设您的逗号是第一个值的点。

编辑:

一个完整的例子是:

import numpy as np:

np.linalg.norm(x = df['your column'])

编辑2(与我的评论相关):

我建议您使用所需的值生成列表或字典,因为将所有内容附加到表可能会导致表格非常大。 然后代码如下:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A':[0.2340, 0.5463, 0.5652, 0.3243, 0.3243],
    'B':[0.3244, 0.5566, 0.2344, 0.1213, 0.9821],
    'C':[0.5652,  0.3453, 0.3454, 0.5656, 0.6766],
    'D':[0.5125,  0.3345, 0.1112, 0.4545, 0.6324]
})

df_distances = df.transpose()           #Transpose columns to rows

for col in df:
    for col2 in df:
        df_distances["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])