Finished B.Sc. at the Hebrew University of Jerusalem, M.Sc. at Ben-Gurion University of the Negev, and currently pursuing my Ph.D. degree.

I am currently interested in deep neural networks, especially Convolutional and Siamese neural networks.


Feel free to contact me any time at

View Majeed Kassis's profile on LinkedIn


Systems Programming - SPL
Programming Laboratory - SPLab
Advanced Topics in Distributed and Reactive Programming- ATD




Previous Teaching

  • 2015
    • Systems Programing
    • System Programming Laboratory
  • 2014
    • Systems Programing
    • System Programming Laboratory
  • 2013
    • Systems Programing
    • Computer Architecture and System Programming Laboratory
  • 2012
    • Systems Programing
    • Computer Architecture and System Programming Laboratory
    • Computer Programming A

VML-HD: The Historical Arabic Documents Dataset for Recognition Systems

A new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088-1451. We took 668 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 159,149 sub-word appearances consisted of 326,289 characters out of a vocabulary of 5,509 forms of sub-words. The database is described in detail and is designed for training and testing recognition systems for handwritten Arabic sub-words. This database is available for the purpose of research, and we encourage researchers to develop and test new methods using our database.

Copyright and Citation

This dataset is intended for research purposes only. If you wish to use the dataset for anything besides research, you must get our explicit consent.

If you download and use the dataset in your research, you must cite our paper:
title={VML-HD: The Historical Arabic Documents Dataset for Recognition Systems},
author={Kassis, Majeed and Abdalhaleem, Alaa and Droby, Ahmad and Alaasam, Reem and El-Sana, Jihad}, booktitle={1st International Workshop on Arabic Script Analysis and Recognition},

Download Database
XML Files

Creative Commons License

Books for Alignment


ASAR 2018 - Arabic Historical Manuscripts Recognition Competition

Eligibility: Each participant is eligible to participate in both tracks, and it is possible for each participant to submit more than one system.
Participants must present newly unpublished systems, and submit their paper describing the system to the ASAR 2018 workshop.

System Submission: January 31st, 2018, 23:59 GMT
Paper describing the system must be submitted to C2 topic in easychair website.
Accepted papers will be added in a special section of the ASAR 2018 proceedings.
Notification: February 9th, 2018.
Results will be published in the ASAR-2018 conference in London, United Kingdom.

Data for Segmentation Based Track
Data for Segmentation Free Track

Dataset Information

The dataset contains 5 manuscripts totaling 668 pages, annotated on the subword level. These manuscripts are split to five folds named 'a','b','c,'d', and 'e'. Each fold contains randomly chosen 20% pages from each manuscripts. The recognition score will be calculated using the Mean average precision (MAP) score. Mean average precision is a widespread measure for the performance of information retrieval systems. The metric is defined as the average of the precision value obtained after each relevant word is retrieved.

Registration and System Specifications

To register please send an email to the organizers at titled "ASAR2018 - Arabic Historical Manuscripts Recognition Competition Registration". The system needs to be submitted to the same email address as well.
The system needs to be an executable that accepts the test fold as input, and to output the data. The system is also required to be able to calculate the MAP information of the output data. The source code must be submitted as well.
Please make sure that your program runs out-of-the-box without any needs of prior configuration. If needed, the system can be submitted as a virual machine image.

For any inquiry please send an email to the organizers at

For faster download of the competition files please click here



  • Negev Scholarship, The Negev Scholarship for outstanding Ph.D. students. Ben-Gurion University of the Negev, 2014.


We've worked very hard to annotate 668 pages taken from 5 different books. Please take a moment and read the README file, as well as the copyright notice before downloading the dataset.

For your convenience, we've split the dataset to 5 links, each link contains all the pages annotated for a specific book.

The database consists of books written by various writers, in the years 1088-1451. The books where photographed using a very high quality camera, namely Hasselblad H5D-60 Medium Format Digital SLR Camera from 1m distance. The books originate from the National Library of Israel, in Jerusalem. They are stored in an uncompressed TIFF format, where each image is roughly of size 6000x6000 pixels. Each image is roughly 100mb of size, so due to size limitations the released dataset contains images of reduced size. The datasets are still quite large, each file is rougly 1GB.

  • Please send an e-mail to majeek at cs bgu ac il for a dataset link.