将字符串的pd.Series转换为ndarray

时间:2019-02-02 11:32:31

标签: python pandas numpy reshape2

我从pandas列中提取了一个单词数组:

 <div class="container text-center">
            <div class="row">
                <div class="col-md-12">
                    <img src="http://cdn.onlinewebfonts.com/svg/img_542923.png" class="mx-auto d-block" style="width:15%">
                </div>
            </div>

            <div class="row">

                    <?php
                       $ua=getBrowser();
                    ?>
                    <div class="col-md-2"></div>
                    <div class="col-md-8 text animated bounceInRight" id="content">
                       <?php echo "Your browser: {$ua['name']}"; ?>
                       <br />
                       <?php echo "Version of the browser you're using is : {$ua['version']}"; ?>
                    </div>
                    <div class="col-md-2"></div>
                </div>

      </div>
  

X的示例:array(['dog','cat'],dtype = object)

X是665个字符串的大熊猫系列。 然后我将每个单词转换为(1,270)的ndarray

X = np.array(tab1['word'])

我的最终目标是获得形状为Ndarray的:(665,270) 但是我得到的形状是:(665,) 当我尝试进行以下操作时,我也无法重塑它:for i in range(len(X)): tmp = X[i] z = func(tmp) #function that returns ndarray of (1,270) X[i] = z 我收到此错误:

X.reshape(665,270)

ValueError: cannot reshape array of size 665 into shape (665,270) 函数可以是任何函数,例如:

func(word)

关于为什么会这样的任何想法?

2 个答案:

答案 0 :(得分:1)

问题是关于由变革函数,给定的字符串的输入,返回一个转换的字符串的熊猫系列成NumPy的阵列(1,n)的数组。

这是解决方案:

import pandas as pd
import numpy as np

# You have a series of strings
X = pd.Series(['aaa'] * 665)

# You have a transformative func that returns a (1, n) np.array
def func(word, n=270):
    return np.zeros((1, n))

# You apply the function to the series and vertically stack the results
Xs = np.vstack(X.apply(func))

# You check for the desidered shape
print(Xs.shape)

答案 1 :(得分:-1)

下面的关键行是:

z = list(func(tmp)) # converting returned value from func to a list

result = np.array([x for x in X.values])

这是我完整的测试代码:

import numpy as np
import pandas as pd


def func(tmp):
    return np.array([t for t in tmp])


X = pd.Series({'a': 'abc', 'x': 'xyz', 'j': 'jkl', 'z': 'zzz'})
for i in range(len(X)):
    tmp = X[i]
    z = list(func(tmp)) # converting returned value from func to a list
    X[i] = z

result = np.array([x for x in X.values])

然后在控制台上键入结果,您会看到它是(4,3)ndarray。

In[3] result
Out[3]: 
array([['a', 'b', 'c'],
       ['x', 'y', 'z'],
       ['j', 'k', 'l'],
       ['z', 'z', 'z']], dtype='<U1')