In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the "precision" and the "recall" of the test to compute the score. The F-measure score can be interpreted as a weighted average of the precision and recall, where an F-measure score reaches its best value at 1 and worst score at 0.

The higher the score the better your threshold distinguishes between pairs which are related and pairs which are not related with regard to the test-set.

$$F = 2* \frac{precision*recall}{precision+recall}$$ where precision is defind as:

$$Precision = \frac{tp}{tp+fp}$$ and recall is defined as:

$$Recall = \frac{tp}{tp+fn}$$ considering the following notation:

  • tp - true positive - a test outcome is positive and this is a correct answer.
  • tn - true negative - a test outcome is negative and this is a correct answer.
  • fp - false positive - a test outcome is positive and this is incorrect answer.
  • fn - false negative - a test outcome is negative and this is incorrect answer.

Basically precision is the ratio between the number of times answering "positive" correctly and the total number of times giving "positive" as an answer. Recall is the ratio between the number of times answering "positive" correctly and the number of "positive"s that really exist.

Thus precision measures the accuracy and recall measures coverage.