我想每次在Pandas DataFrame的元素中作为数组出现时提取表达式,但每次使用多个字符表达式时都会出错。为什么我收到此错误?如何使提取按预期工作?
import pandas as pd
wiki = ["In theoretical computer the like operations.",
"The a filter.",
"In the.",
"the dog is the one",
"See below for details."
]
wiki
x = pd.DataFrame(wiki, columns = ['wiki'])
x
x.wiki.str.extractall('(the)')
## x.wiki.str.extractall('(the)')
## Traceback (most recent call last):
##
## File "<ipython-input-7-ca5d102219f3>", line 1, in <module>
## x.wiki.str.extractall('(the)')
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\strings.py", line 1621, in extractall
## return str_extractall(self._orig, pat, flags=flags)
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\strings.py", line 716, in str_extractall
## result = DataFrame(match_list, index, columns)
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 263, in __init__
## arrays, columns = _to_arrays(data, columns, dtype=dtype)
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 5352, in _to_arrays
## dtype=dtype)
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 5431, in _list_to_arrays
## coerce_float=coerce_float)
##
## File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 5489, in _convert_object_array
## 'columns' % (len(columns), len(content)))
##
## AssertionError: 1 columns passed, passed data had 3 columns
x.wiki.str.extractall('(t)')
## x.wiki.str.extractall('(t)')
## Out[8]:
## 0
## match
## 0 0 t
## 1 t
## 2 t
## 3 t
## 4 t
## 1 0 t
## 2 0 t
## 3 0 t
## 1 t
## 4 0 t
match
0 0 the
1 the
2 0 the
3 0 the
1 the
答案 0 :(得分:1)
extractall()
方法有一个bug应该在pandas 0.18.2中修复,这应该很快就会发布,所以让我们耐心或冒一点风险并使用beta 0.18.2 version ...;)