One quick improvement would be to shift the [font=monospace]is_image[/font] test to after the others, since that's the expensive one.
Starting the loop with a check to see if the URL you're about to crawl has already been crawled is redundant, because you checked that before adding it in the first place.
Another would be to use store the URLs as sets; store them as the keys of [font=monospace]$crawled_array[/font] and [font=monospace]$tocrawl_array[/font] and irrelevant (though non-null) value. [font=monospace]isset($array[$url])[/font] would be faster than [font=monospace]in_array($url, $array)[/font].
Yet another observation: you never remove anything from [font=monospace]$tocrawl_array[/font]: that's going to take ever longer to search as well, for stuff that would almost always already be in [font=monospace]$crawled_array[/font]. Running it as a stack or queue (depending on whether you want to go depth-first or breadth-first) would keep its size down:
$tocrawl_array = [Starting URL];
$crawled_array = [];
while(!empty($tocrawl_array))
{
$crawl = array_pop($tocrawl_array);
get DOM from document at crawl;
make $urls_found a list of links in current page
// Filter out those already crawled
$urls_found = array_keys(array_diff_key(array_flip($urls_found), $crawled_array));
// Keep only those that are interesting enough to follow further
$urls_found = array_diff(array_filter($urls_found, '...interesting urls only...'), $tocrawl_array);
$crawled_array[$crawl] = true;
$tocrawl_array = array_merge($tocrawl_array, $urls_found);
}
Finally, since you are only interested in HTML documents, having a check to see if what you fetched was an image doesn't seem useful anyway; especially since there are many many other kinds of file than just "HTML document" and "JPEG/GIF/TIFF/PNG/BMP image". So it would make more sense to see if what you've fetched is an HTML document, and discard it if not.