This paper describes a simple but effective hack to the (averaged) perceptron algorithm, which allows learning much sparser models, with very small (if any) drop in accuracy. It was rejected from ACL-2011 (short papers). I believe the technique is useful. I believe it would have been accepted eventually. I do not care. I am not going to submit it again. I decided to publish it on my webpage instead (along with the reviews). If you found it to be useful also, I'd be nice if you drop me a line and/or cite it as a tech-report.