[RESOLVED] array walk

Roger_Ramjet

Help I need some hints here.

I want to walk through an array and apply a function to each element; basically identify and extract to a var for storage in db tables. The number of elements in each array varies: this is bibliographic data about journal articles so there are eg 1 or more Authors - each prefixed with 'AU - '. Keywords and Citations are even more problematic as they have no prefix: once I start to process them then it is test for the next field which does have a prefix.

What I can't figure out is how to define multiple entry points into my function/algorithm: once all authors are processed I don't want to be testing for them again, I want to move directly on to the next field. Again, because keywords and citations have no prefix I need to know where I am so that I know what to do with the data.

Now I could have done this easily with a bitmask and branch conditional back in the days when I did Assembler. Even easier with Prolog. But how on earth does one do it in PHP??? Apart from some really cumbersome and ugly chain of 25 IF ELSEIF s.

Any thoughts would really be great cos I've hit that brick wall on this.

[edit] and just asking the question has givem me some new ideas - but don't let that hold you back, get stuck in folks[/edit]

rowanparker

Couldn't you use for each to loop through the array and just do things with each item?

Weedpacket

Just to clarify; we're talking about an array of bibliographic records, which each consisting of an array of - at this stage - "lines", of which some are Authors, some are Citations, etc. And for each record you want to be able to pin down which lines are which for each record in turn. We're not talking BibTeX, unfortunately....

I don't see how you're going to avoid the necessary parsing to see if a line starts with "AU" or not; you have to get that information somehow. This '25' you mention - is that the number of lines per record or the number of different types of line? (I'd be very surprised to hear it's the latter!)

I'm just playing around here:

$authors = preg_grep('/^AU/', $record);
$record = array_diff($record, $authors);
$title = preg_grep('/^TI/', $record);
$record = array_diff($record, $title);
//...

So that each type of line gets its own array. Whatever is left after identifying whatever is prefixed you're left with distinguishing unprefixed elements from each other.

Hm. Yeah. Functional programming.

But you mention that to find keywords and citations you have to find the next record. What mechanism do you have to determine where one record leaves off and the next begins? There must be something otherwise how do you know where, say, the keywords begin?

Roger_Ramjet

Weedpacket wrote:
We're not talking BibTeX, unfortunately....

Well, BibTeX was/is an option. RefMan was chosen because it contains additional fields. Why, do you know of an available BibTeX parser ... looking at an example it does seem much more regular. Trouble is the other fields include the ISI No and the authors' research groups.

Weedpacket wrote:
I don't see how you're going to avoid the necessary parsing to see if a line starts with "AU" or not; you have to get that information somehow. This '25' you mention - is that the number of lines per record or the number of different types of line? (I'd be very surprised to hear it's the latter!)

Well 25 is a bit of an exaggeration, it is really only 24. That includes some information being repeated in 2 fields and the sequence of the article in the list. I am after 14 of them.

Now what I did not want to do was to keep on testing against AU once the authors had been processed. So I went with a switch and a step counter to control it. Still not ideal but it got me started.

Had a search on sourceforge and lots of bibtex parsers there so we may just go with that.

Roger_Ramjet

On further investigation I find that at least part of my problem is that EndnoteWeb does not export RefMan RIS format properly, so I was trying to parse mangled data.

All this was new to me but thanks to Weedpackets comment I've done some reading around and found all sorts of solutions instead of reinventing the wheel as I was doing. So I'll mark this as resovled, but the basic programming construct is a conundrum still interests me so I'll file that away for another day.