如何通过循环将按行数据存储在具有预定义列名的pandas数据框中? 这是用于从图像中提取文本。
df = pd.DataFrame(columns=['Col1','Col2', 'Col3', 'Col4', 'Col5','Col6', 'Col7', 'Col8','Col9', 'Col10', 'Col11', 'Col12', 'Col13', 'Col14'])
for itam in dirs:
if os.path.isfile(path+itam):
for a in x:
img = Image.open(path+itam)
crop = img.crop(a)
text = pytesseract.image_to_string(crop, config='--psm 4')
for i in range(len(text)):
df.loc[i] = [text for n in range(14)]
TypeError: cannot concatenate object of type "<class 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
预期产量
我想将循环数据存储到具有预定义列的pandas数据框中,但是通过使用上述代码,它可以存储同一记录的多个副本。
print(df)
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 col13 col14
0 0 0 1 2 3 4 5 6 8 9 10 11 12 13
2 0 1 2 3 4 5 6 8 9 10 11 12 13 14
答案 0 :(得分:0)
来自docs:
将行迭代添加到DataFrame可能比单个连接更多地占用大量计算资源。更好的解决方案是将这些行添加到列表中,然后一次将列表与原始DataFrame连接起来。
您可以轻松地将数据附加到列表并使用该列表实例化DataFrame。
import pandas as pd
container = []
for a in x:
# get the text you want
text = 'somestring'
char_list = [i for i in text]
container.append(char_list)
# now create your DataFrame
frame = pd.DataFrame(container, columns=['Col1','Col2', 'Col3', 'Col4', 'Col5','Col6', 'Col7', 'Col8','Col9', 'Col10', 'Col11', 'Col12', 'Col13', 'Col14'])
In [1]: datlist = [[i for i in '01234567890123'] for ii in range(10)]
In [2]: datlist
Out[2]:
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3'],
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3']]
In [3]: pd.DataFrame(datlist, columns=range(len(datlist[0])))
Out[3]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 0 1 2 3 4 5 6 7 8 9 0 1 2 3
1 0 1 2 3 4 5 6 7 8 9 0 1 2 3
2 0 1 2 3 4 5 6 7 8 9 0 1 2 3
3 0 1 2 3 4 5 6 7 8 9 0 1 2 3
4 0 1 2 3 4 5 6 7 8 9 0 1 2 3
5 0 1 2 3 4 5 6 7 8 9 0 1 2 3
6 0 1 2 3 4 5 6 7 8 9 0 1 2 3
7 0 1 2 3 4 5 6 7 8 9 0 1 2 3
8 0 1 2 3 4 5 6 7 8 9 0 1 2 3
9 0 1 2 3 4 5 6 7 8 9 0 1 2 3