- Edited
sneakyimp So you've worked with AWS Comprehend?
Not specifically. A couple/three years ago I helped out on a ML project trying to identify problematic comments in review submissions. We used Amazon ML tools, though I don't recall which specific ones. My role mainly was helping to provide data for training/evaluation. I learned some stuff about ML by osmosis and a bit of tinkering around, but haven't really touched it since then.
We got some pretty promising results, but then the business moved in a different direction and it all got put on hold. We're now using ML for a very different purpose (more for predicting things, versus evaluating things), but I'm not closely involved, other than helping out with a few ancillary support things. I think we're using a mix of AWS tools along with some open-source libraries and such. 

), the spam/ham flag should not be part of the data that is being parsed, tokenized, evaluated, whatever the heck the terms are. It should only be used to evaluate the result of an iteration to determine how well/poorly it did, so that it can then hopefully determine what it should try for the next iteration.
