General

Finished B.Sc. at the Hebrew University of Jerusalem, M.Sc. at Ben-Gurion University of the Negev, and currently pursuing my Ph.D. degree.

I am currently interested in deep neural networks, especially Convolutional and Siamese neural networks.

Contact

Feel free to contact me any time at

Github

You can follow my work at my github.

LinkedIn

More about me at linkedin.

Teaching

Systems Programming - SPL
Programming Laboratory - SPLab
Advanced Topics in Distributed and Reactive Programming- ATD

ATD - Exams

Presentations

ATD

SPL

SPL

Previous Teaching

  • 2018
    • Advanced Topics in Distributed and Reactive Programming
    • Systems Programing
    • System Programming Laboratory
  • 2017
    • Advanced Topics in Distributed and Reactive Programming
    • Systems Programing
    • System Programming Laboratory
  • 2016
    • Systems Programing
    • System Programming Laboratory
  • 2015
    • Systems Programing
    • System Programming Laboratory
  • 2014
    • Systems Programing
    • System Programming Laboratory
  • 2013
    • Systems Programing
    • Computer Architecture and System Programming Laboratory
  • 2012
    • Systems Programing
    • Computer Architecture and System Programming Laboratory
    • Computer Programming A

VML-HD: The Historical Arabic Documents Dataset for Recognition Systems

A new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088-1451. We took 668 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 159,149 sub-word appearances consisted of 326,289 characters out of a vocabulary of 5,509 forms of sub-words. The database is described in detail and is designed for training and testing recognition systems for handwritten Arabic sub-words. This database is available for the purpose of research, and we encourage researchers to develop and test new methods using our database.

Copyright and Citation

This dataset is intended for research purposes only. If you wish to use the dataset for anything besides research, you must get our explicit consent.

If you download and use the dataset in your research, you must cite our paper:
@inproceedings{kassis2017vmlhd,
title={VML-HD: The Historical Arabic Documents Dataset for Recognition Systems},
author={Kassis, Majeed and Abdalhaleem, Alaa and Droby, Ahmad and Alaasam, Reem and El-Sana, Jihad}, booktitle={1st International Workshop on Arabic Script Analysis and Recognition},
year={2017},
organization={IEEE}

Download Database
XML Files

Creative Commons License

Books for Alignment

Complete Manuscripts: 0207 0206
Annotated Manuscripts: 0206 0207

VML-HD: The Arabic Historical Manuscripts Dataset for Recognition Systems


Dataset Information: 5 Folds Version

The dataset contains 5 manuscripts totaling 668 pages, annotated on the subword level. These manuscripts are split to five folds named 'a','b','c,'d', and 'e'. Each fold contains randomly chosen 20% pages from each manuscripts. The recognition score will be calculated using the Mean average precision (MAP) score. Mean average precision is a widespread measure for the performance of information retrieval systems. The metric is defined as the average of the precision value obtained after each relevant word is retrieved.


Each track of the two contains the folds images and annotation information in their own format.

Segmentation Based Track

In this track, you will receive the documents segmented, where each image of a manuscript page is segmented to its corresponding annotation information and stored in its own folder labeled with the same name of the manuscript page image name.Along with each folder containing the subwords, the annotation data are also provided. For this track it is expected that segmentation based algorithms are used to recognize the subwords found in the dataset.

Data for Segmentation Based Track

Segmentation Free Track

In this track, you will receive the documents unsegmented, where each image will contain a complete manuscript page. Along with each page, the annotation data are also provided. For this track it is expected that segmentation free algorithms are used to recognize the subwords found in the dataset.

Data for Segmentation Free Track

For any inquiry please send an email to the organizers at

To download of the files for each track of the two, please click here.

Publications

Awards

  • Negev Scholarship, The Negev Scholarship for outstanding Ph.D. students. Ben-Gurion University of the Negev, 2014.

Dataset

We've worked very hard to annotate 668 pages taken from 5 different books. Please take a moment and read the README file, as well as the copyright notice before downloading the dataset.

For your convenience, we've split the dataset to 5 links, each link contains all the pages annotated for a specific book.

The database consists of books written by various writers, in the years 1088-1451. The books where photographed using a very high quality camera, namely Hasselblad H5D-60 Medium Format Digital SLR Camera from 1m distance. The books originate from the National Library of Israel, in Jerusalem. They are stored in an uncompressed TIFF format, where each image is roughly of size 6000x6000 pixels. Each image is roughly 100mb of size, so due to size limitations the released dataset contains images of reduced size. The datasets are still quite large, each file is rougly 1GB.

  • Please send an e-mail to majeek at cs bgu ac il for a dataset link.