Berat Kurar Barakat

I'm a PhD research student of computer science in Visual Media Lab (VML) at Ben-Gurion University of the Negev.

My latest research is on historical document image analysis using digital image processing and machine learning methods.

Office: 37/-102

Home     Teaching     Google Scholar     GitHub    


Datasets
VML-HTR dataset: Handwritten Text Recognition dataset consists of five Arabic manuscripts annotated in subword level and text line level. Each manuscript contains 27000-35000 subwords and 1400-2000 text lines. Further statistics are available at here. Line level annotation is automatically generated using the subword level annotation from the VML-HD dataset. The ground truth is available in PAGE-XML format.
VML-HP dataset: Hebrew Paleography dataset consists of 537 document page images with labels of 15 script sub-types. Ground truth is manually created by a Hebrew paleographer at a page level. In addition, we propose a patch generation tool for extracting patches that contain an approximately equal number of text lines no matter the variety of fontsizes. The VML-HP dataset contains a train set and two test sets. The first is a typical test set, and the second is a blind test set for evaluating algorithms in a more challenging setting.
HHD dataset: Hebrew Handwritten dataset consists of natural handwritten Hebrew character images extracted from pangram paragraphs. Hence the dataset contains 26 classes balanced in terms of number of samples. Train set contains 3965 samples, test set contains 1134 samples.
VML-AHTE dataset: Arabic Handwritten Text Line Extraction dataset is a natural handwritten benchmark dataset for text lines with crowdy diacritics, touching and overlapping characters. It is fully labeled at line level by native Arabic speakers. The dataset contains 20 training pages and 10 test pages. Every document image has a corresponding ground truth in the form of pixel labels and PAGE xml.
The Pinkas dataset is a public historical document image dataset. It is the first dataset in medieval handwritten Hebrew and fully labeled at word, line and page level by an expert of historical Hebrew manuscripts.
VML-MOC: Multiply oriented and curved handwritten text line dataset is a natural handwritten benchmark dataset for heavily skewed and curved text lines. These text lines are side notes added by scholars over the years on the page margins each time with a different orientation or sometimes in an extremely curvy form due to space constraints. The dataset contains 20 training pages and 10 test pages. Every document image has a corresponding ground truth in the form of pixel labels and PAGE xml.
Challenging text line dataset contains 30 pages from two different manuscripts. It is written in Arabic language and contains 2732 text lines where a considerable amount of them are multidirected, multi-skewed or curved. Ground truth where text lines were labeled manually by line masks, is also available in the dataset.
Complex layout dataset contains 32 document images from 2 manuscripts which were scanned at a private library located at the old city of Jerusalem and other samples which were collected from the Islamic manuscripts digitization project at Leipzig university library.

Publications
Is a deep learning algorithm effective for the classification of medieval Hebrew scripts?
Daria Vasyutinsky Shapira, Irina Rabaev, Ahmad Droby, Berat Kurar Barakat and Jihad El-Sana
2022 #DHJewish - Jewish Studies in the Digital Age
[paper] [code]
Unsupervised learning of text line segmentation by differentiating coarse patterns
Berat Kurar Barakat, Ahmad Droby, Raid Saabni, and Jihad El-Sana
2021 International Conference on Document Analysis and Recognition (ICDAR)
[paper] [code]
Learning-Free Text Line Segmentation for Historical Handwritten Documents
Berat Kurar Barakat, Rafi Cohen, Ahmad Droby, Irina Rabaev and Jihad El-Sana
Applied Sciences Journal, MDPI
[paper] [code]
Deep learning for paleographic analysis of medieval Hebrew manuscripts: a DH team collaboration experience
Daria Vasyutinsky, Irina Rabaev, Berat Kurar Barakat, Ahmad Droby, and Jihad El-Sana
2020 Twin Talks: Understanding and Facilitating Collaboration in Digital Humanities
[paper] [slides] [code]
Unsupervised deep learning for text line segmentation
Berat Kurar Barakat, Ahmad Droby, Rym Alasam, Boraq Madi, Irina Rabaev, Raed Shammes and Jihad El-Sana
2020 25th International Conference on Pattern Recognition (ICPR)
[paper] [code] [poster]
Text line extraction using fully convolutional network and energy minimization
Berat Kurar Barakat, Ahmad Droby, Rym Alasam, Boraq Madi, Irina Rabaev, and Jihad El-Sana
2020 2nd International Workshop on Pattern Recognition for Cultural Heritage (PatReCH)
[paper] [slides]
Unsupervised deep learning for handwritten page segmentation
Ahmad Droby, Berat Kurar Barakat, Boraq Madi, Rym Alasam, and Jihad El-Sana
2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[paper] [code]
The HHD dataset
Irina Rabaev, Berat Kurar Barakat, Alexendar Churkin, and Jihad El-Sana
2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[paper] [code]
The pinkas dataset
Berat Kurar Barakat, Irina Rabaev, and Jihad El-Sana
2019 International Conference on Document Analysis and Recognition (ICDAR)
[paper] [poster] [code]
VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset
Berat Kurar Barakat, Rafi Cohen, Irina Rabaev, and Jihad El-Sana
2019 3rd International Workshop on Arabic and derived Script Analysis and Recognition (ASAR)
[paper] [slides] [code]
Layout analysis on challenging historical Arabic manuscripts using siamese network
Reem Alaasam, Berat Kurar Barakat, and Jihad El-Sana
2019 International Conference on Document Analysis and Recognition (ICDAR)
[paper] [poster] [code]
Text Line Segmentation for Challenging Handwritten Document Images using Fully Convolutional Network
Berat Kurar Barakat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana
2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[paper] [poster] [code]
Word Spotting Using Convolutional Siamese Network
Berat Kurar Barakat, Reem Alaasam, and Jihad El-Sana
2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
[paper] [poster] [pitch] [code]
Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks
Berat Kurar Barakat and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
[paper] [slides] [code]
Case Study Fine Writing Style Classification Using Siamese Neural Network
Alaa Abdalhaleem, Berat Kurar Barakat, and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
[paper] [slides]
Synthesizing versus Augmentation for Arabic Word Recognition with Convolutional Neural Networks
Reem Alaasam, Berat Kurar Barakat, and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
[paper]
Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text
Reem Alaasam, Berat Kurar Barakat, and Jihad El-Sana
2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR)
[paper]

Honors
ICFHR 2018 Competition on Recognition of Historical Arabic Scientific Manuscripts
Winner Team of Page Segmentation Track
Berat Kurar Barakat, Ahmad Droby, and Jihad El-Sana
[paper] [results] [code]
ASAR 2018 Layout Analysis Competition
Winner Team of Classification Track
Ahmad Droby, Berat Kurar Barakat, and Jihad El-Sana
[paper] [results] [code]