regex newlines when followed by string

not_skeletor

I have a massive amount of text I have to parse through. Basically, a plain-text email, about 400-lines long, with some fields I need to pull out.
I found the best way to do this is to explode everything along newlines, then use pre_grep to get me to the right place (so, in the email, someone puts in "Username: dave", I grep "username" and match out "dave" to an array. The problem is, some input wraps around on a newline, thus grep-ping for the start of the proper line does me no good as it will now miss everything on the next line (next key in the array).
I cannot simply replace ALL the newlines, as there are parts where I require the newlines to be there.
I want to strip out the newlines ONLY if they are followed by any string off text. I was thinking maybe the positive lookaround asertion, but I cannot get it. So, anybody a guru-regexer out there?

Go through a file, convert any newlines that are followed by any carchter (or number) to spaces, leaving all other newlines (and carriage returns) intact.

xblue

Hi,

I think it might help to give an example of the text (not the full 400 lines, just what is essential).

If part of the newlines are followed by some alphanumeric characters, what are the others you would like to treat differently followed by?

not_skeletor

Like this:

#First name: Dave
#Last name: Thesquid
#Address: 123 Fake st
Springfield, IL 90210
#Phone: 555-1212
#
#Arrive
#Date: 12-12-1904
#Location: SUX
#
#Depart
#Date: 12-13-1904
#Location: EGG
#Gate: something
#

So, I would explode it along newlines, grep for "#address" and get only "123 Fake st". Removing the newlines would seem ideal, but there are a few parts where I have to count between newlines (the depart and arrive part, as I cannot simply grep for Date: or I'll pull both, or the wrong one) and the pertinant information needs to be grouped by that date, other wise stripping all the newlines would be fine.
So, I need to get rid of the newline after "123 fake st" but NOT after "Location: SUX" or "Location: EGG". I fgured if I could get rid of the lines only IF they have a newline followed by the "#" and \D.

xblue

can't you just explode on # (or newline and #) instead? seems the simplest solution to me.

not_skeletor

Figured it out. The problem is, its an email. And the email has newline AND carriage returns. What a pain.
So, it strips out \n\r is there is no string or a single space immediately proceeding the newline/carriage.
Then explode it on newlines. Worked great. Thanks for all the repsonses