Ideally when you talk to a service layer, like something that sends emails, it would be nice to encapsulate that in something that is easily testible. I've not done that yet, but thats the idea.
Even if you did that at some point you'd have to verify that it actually does what its supposed to: set headers, boundry markers, email body, etc correctly. Now, if the emails are being sent OK you can simply look at the raw message to verify all these things...which would be difficult to automate.
Your idea of looking in mbox is actually pretty decent. If you had your own linux development box you could install nullmailer....then after your mail is "sent" you can look in the queue, find the message, and check it. Again, difficult to automate. If somehow you made sure the only message in the queue (or mbox if you go that route) was going to be the message you sent, that would make automation a little easier.
Testing mail is even worse than testing database layer. Its even worse in production environments where sys. engineers might be messing w/ filters that cause perfectly good code to stop working. I won't mention any names here 🙂