the provided corpus is a human tagged collection of pairs of sentences marked as being in a paraphrase relationship.It is in json format which adheres to the following schema: [ { "first" : "string", "second" : "string", "tag" : "int", "date" : "string" } ] Where (first, second) is the sentence pair, tag is a binary value, representing if the pair is in paraphrase relationship, and date is the time the tagging was created in yyyy_mm_dd format