I created a custom classifier model using AWS Comprehend and trained it with the SMS spam dataset here. I was quite surprised at how difficult it is to find spam filter training sets.
I then wrote a script to use this classifier model to test the various examples we've received on a contact form. The AWS model matched our manual (i.e., human-entered) spam/ham assessments 66% of the time. I wasn't sure whether to be disappointed at these results or surprised at how good they were, given that the dataset I used is for SMS text messages, which are quite short.
I then appended our contact form ham/spam records to the end of that SMS dataset and trained another classifier model. I then tested our contact form entries using the new model, which matched our human ham/spam assessments 95% of the time. I'd be delighted to get such results on incoming novel ham/spam contact form entries.
I would point out that we only had about 41 contact form entries to add + test. I found it quite interesting that even when added our own 41 entries to the model's training set, the model still failed to classify two of these entries correctly. One entry was falsely classified as ham when it should have been spam, but only by an extremely narrow margin:
spam: 0.49563866853714
ham: 0.50436133146286
The other failure was a false positive for spam for the exceedingly terse message "digital marketing assistance."
Very interesting that the model would fail to properly classify records included as part of its training set.