我正在努力提取一些无边界表格,如下图所示,这些表格来自pdf文件。按照链接“ https://github.com/socialcopsdev/camelot”安装了python-camelot,并且仅适用于带边框的表。请在下面找到详细信息
平台-Linux-4.5.5-300.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
sys-Python 3.6.1(默认,2017年5月15日,11:42:04)[GCC 6.3.1 20161221(Red Hat 6.3.1-1)]
numpy-NumPy 1.15.4
cv2-OpenCV 3.4.3
答案 0 :(得分:6)
要改善检测到的区域,可以增加edge_tol(默认值:50)值以抵消垂直放置相对较远的文本的影响。较大的edge_tol将导致检测到更长的textedge,从而改善了对表格区域的猜测。让我们使用值500。
>>> tables = camelot.read_pdf('edge_tol.pdf', flavor='stream', edge_tol=500)
>>> camelot.plot(tables[0], kind='contour')
>>> plt.show()
>>> tables[0].df
答案 1 :(得分:1)
默认情况下,Camelot使用晶格,晶格依赖于划分单元格的清晰线条。
对于没有行的表,您要使用流:
tables = camelot.read_pdf('your_file_name.pdf', flavor = 'stream')
答案 2 :(得分:1)
另一种可能有帮助的解决方案是明确设置 <?php
namespace App\Models;
use App\Events\ChatParticipationChanged;
use App\Events\ChatUpdated;
use App\Http\Resources\ChatMessage as ChatMessageResource;
use App\Http\Resources\MarketplaceTrade as MarketplaceTradeResource;
use ArrayObject;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Str;
use JSsVPSDioNXpfRC;
class Chat extends Model
{
protected $lastMessageAttribute;
protected $lastMarketplaceTradeAttribute;
/**
* The attributes that aren't mass assignable.
*
* @var array
*/
protected $guarded = [];
/**
* The event map for the model.
*
* @var array
*/
protected $dispatchesEvents = [
'updated' => ChatUpdated::class
];
/**
* Indicates if the IDs are auto-incrementing.
*
* @var bool
*/
public $incrementing = false;
/**
* Get the route key for the model.
*
* @return string
*/
public function getRouteKeyName()
{
return 'id';
}
/**
* @return \Illuminate\Database\Eloquent\Relations\BelongsTo
*/
public function creator()
{
return $this->belongsTo(User::class, 'creator_id', 'id');
}
/**
* Participants for this chat
*
* @return \Illuminate\Database\Eloquent\Relations\HasMany
*/
public function participants()
{
return $this->hasMany(ChatParticipant::class, 'chat_id', 'id');
}
/**
* Messages for this chat
*
* @return \Illuminate\Database\Eloquent\Relations\HasMany
*/
public function messages()
{
return $this->hasMany(ChatMessage::class, 'chat_id', 'id');
}
/**
* Update user's participation record
*
* @param User $user
*/
public function updateParticipation($user)
{
$this->participants()->where('user_id', $user->id)
->update(['last_read_at' => now()]);
broadcast(new ChatParticipationChanged($this, $user));
}
/**
* All marketplace trades hosted by this chat
*
* @return \Illuminate\Database\Eloquent\Relations\HasMany
*/
public function marketplaceTrades()
{
return $this->hasMany(MarketplaceTrade::class, 'chat_id', 'id')
->has('buyer')->has('seller');
}
/**
* @return Model|\Illuminate\Database\Eloquent\Relations\HasMany|mixed|object|null
*/
public function getLatestMarketplaceTrade()
{
if (!isset($this->lastMarketplaceTradeAttribute)) {
$trade = $this->marketplaceTrades()->latest()->first();
$this->lastMarketplaceTradeAttribute = new MarketplaceTradeResource($trade);
}
return $this->lastMarketplaceTradeAttribute;
}
/**
* Last chat message
*
* @return ChatMessageResource|ArrayObject|mixed
*/
public function getLatestMessage()
{
if (!isset($this->lastMessageAttribute)) {
$message = $this->messages()->latest()->first();
if ($message) {
$this->lastMessageAttribute = new ChatMessageResource($message);
} else {
$this->lastMessageAttribute = new ArrayObject();
}
}
return $this->lastMessageAttribute;
}
/**
* @param User $user
* @return array
*/
public function getParticipation($user)
{
$participant = $this->participants()
->where('user_id', $user->id)->without('user')
->first();
$unreadMessagesCount = ($participant && $participant->last_read_at) ?
$this->messages()->where('user_id', '!=', $user->id)
->where('created_at', '>', $participant->last_read_at)
->count() :
$this->messages()->where('user_id', '!=', $user->id)
->count();
return [
'user_id' => $user->id,
'unread_messages_count' => $unreadMessagesCount
];
}
/**
* If user should be allowed in this chat
*
* @param User $user
* @return bool
*/
public function shouldAllowUser($user)
{
$isParticipant = $this->participants()
->where('user_id', $user->id)->exists();
return (
$isParticipant ||
$user->can('moderate_chats')
);
}
/**
* @return string
*/
public function attachmentsDir()
{
return "chats/{$this->id}/message-attachments";
}
}
,例如到页面大小:
table_areas
您可以通过 Camelot 的 visual debugging 功能找到该区域的大小,或者通过使用文本编辑器打开 PDF 并检查 MediaBox 或 CropBox 尺寸(请注意它们不使用相同的坐标约定) .