Python Camelot无边界表提取问题

时间:2018-11-08 14:03:34

标签: python-3.x python-camelot

我正在努力提取一些无边界表格,如下图所示,这些表格来自pdf文件。按照链接“ https://github.com/socialcopsdev/camelot”安装了python-camelot,并且仅适用于带边框的表。请在下面找到详细信息

平台-Linux-4.5.5-300.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four

sys-Python 3.6.1(默认,2017年5月15日,11:42:04)[GCC 6.3.1 20161221(Red Hat 6.3.1-1)]

numpy-NumPy 1.15.4

cv2-OpenCV 3.4.3

camelot-enter image description here卡米洛0.3.2

enter image description here com / bzL0L.png

3 个答案:

答案 0 :(得分:6)

要改善检测到的区域,可以增加edge_tol(默认值:50)值以抵消垂直放置相对较远的文本的影响。较大的edge_tol将导致检测到更长的textedge,从而改善了对表格区域的猜测。让我们使用值500。

>>> tables = camelot.read_pdf('edge_tol.pdf', flavor='stream', edge_tol=500)
>>> camelot.plot(tables[0], kind='contour')
>>> plt.show()
>>> tables[0].df

答案 1 :(得分:1)

默认情况下,Camelot使用晶格,晶格依赖于划分单元格的清晰线条。

对于没有行的表,您要使用流:

tables = camelot.read_pdf('your_file_name.pdf', flavor = 'stream')

答案 2 :(得分:1)

另一种可能有帮助的解决方案是明确设置 <?php namespace App\Models; use App\Events\ChatParticipationChanged; use App\Events\ChatUpdated; use App\Http\Resources\ChatMessage as ChatMessageResource; use App\Http\Resources\MarketplaceTrade as MarketplaceTradeResource; use ArrayObject; use Illuminate\Database\Eloquent\Model; use Illuminate\Support\Str; use JSsVPSDioNXpfRC; class Chat extends Model { protected $lastMessageAttribute; protected $lastMarketplaceTradeAttribute; /** * The attributes that aren't mass assignable. * * @var array */ protected $guarded = []; /** * The event map for the model. * * @var array */ protected $dispatchesEvents = [ 'updated' => ChatUpdated::class ]; /** * Indicates if the IDs are auto-incrementing. * * @var bool */ public $incrementing = false; /** * Get the route key for the model. * * @return string */ public function getRouteKeyName() { return 'id'; } /** * @return \Illuminate\Database\Eloquent\Relations\BelongsTo */ public function creator() { return $this->belongsTo(User::class, 'creator_id', 'id'); } /** * Participants for this chat * * @return \Illuminate\Database\Eloquent\Relations\HasMany */ public function participants() { return $this->hasMany(ChatParticipant::class, 'chat_id', 'id'); } /** * Messages for this chat * * @return \Illuminate\Database\Eloquent\Relations\HasMany */ public function messages() { return $this->hasMany(ChatMessage::class, 'chat_id', 'id'); } /** * Update user's participation record * * @param User $user */ public function updateParticipation($user) { $this->participants()->where('user_id', $user->id) ->update(['last_read_at' => now()]); broadcast(new ChatParticipationChanged($this, $user)); } /** * All marketplace trades hosted by this chat * * @return \Illuminate\Database\Eloquent\Relations\HasMany */ public function marketplaceTrades() { return $this->hasMany(MarketplaceTrade::class, 'chat_id', 'id') ->has('buyer')->has('seller'); } /** * @return Model|\Illuminate\Database\Eloquent\Relations\HasMany|mixed|object|null */ public function getLatestMarketplaceTrade() { if (!isset($this->lastMarketplaceTradeAttribute)) { $trade = $this->marketplaceTrades()->latest()->first(); $this->lastMarketplaceTradeAttribute = new MarketplaceTradeResource($trade); } return $this->lastMarketplaceTradeAttribute; } /** * Last chat message * * @return ChatMessageResource|ArrayObject|mixed */ public function getLatestMessage() { if (!isset($this->lastMessageAttribute)) { $message = $this->messages()->latest()->first(); if ($message) { $this->lastMessageAttribute = new ChatMessageResource($message); } else { $this->lastMessageAttribute = new ArrayObject(); } } return $this->lastMessageAttribute; } /** * @param User $user * @return array */ public function getParticipation($user) { $participant = $this->participants() ->where('user_id', $user->id)->without('user') ->first(); $unreadMessagesCount = ($participant && $participant->last_read_at) ? $this->messages()->where('user_id', '!=', $user->id) ->where('created_at', '>', $participant->last_read_at) ->count() : $this->messages()->where('user_id', '!=', $user->id) ->count(); return [ 'user_id' => $user->id, 'unread_messages_count' => $unreadMessagesCount ]; } /** * If user should be allowed in this chat * * @param User $user * @return bool */ public function shouldAllowUser($user) { $isParticipant = $this->participants() ->where('user_id', $user->id)->exists(); return ( $isParticipant || $user->can('moderate_chats') ); } /** * @return string */ public function attachmentsDir() { return "chats/{$this->id}/message-attachments"; } } ,例如到页面大小:

table_areas

您可以通过 Camelot 的 visual debugging 功能找到该区域的大小,或者通过使用文本编辑器打开 PDF 并检查 MediaBox 或 CropBox 尺寸(请注意它们不使用相同的坐标约定) .