sneakyimp;11043671 wrote:May I inquire how you ended up figuring out what it does? Are you using some fancy IDE or perhaps a compiler command that will render code with macros fully replaced? I'm hoping to learn to fish, not just ask for fish handouts if you know what I mean.
I just used the web-based source browser you linked to.
The sad truth is: I followed the macro chain because I'm not only used to seeing it at work... I'm used to contributing to it. 🙂 (Seriously, if I didn't value my job and/or the NDA contract, I'd post some of the C macros we've got floating around; you know it's bad when your lint program barfs and throws up a warning that simply states "too many lines in macro" or something to that effect. To it's point, some of those macros do span >= 10 lines and take >= 4 arguments...)
sneakyimp;11043671 wrote:If I encountered an array or object while serializing, I would check to see if its memory address (a unique id, no?) existed in an array that I was keeping. If this memory address did NOT exist in the array, I would add it. I was brusquely informed that my approach was inadequate.
Again, I'm not sure I fully understand the code you linked to, so I'm not sure how to judge if your approach is inadequate or not. It does seem that objects have to be handled specially, though.
sneakyimp;11043671 wrote:By 'hash table', I'm guessing you mean the HashTable object pointed to by the parameter var_hash, which is a pointer to a HashTable. (I'm talking myself through this in an attempt to revive long-dormant C-programming skills, please let me know if I get something wrong).
Yes. (Similar concept to arrays or strings; without some extremely inefficient memcpy()'s, you're not really passing those things around - you're going to pass a single pointer to the start of them and let the called function access them in memory directly - hopefully sanely, too.)
sneakyimp;11043671 wrote:As for 'using this address as a unique key', the situation is actually a bit more complicated, doesn't it? The comment suggests that one also needs the the 'object handle' (does he mean the zval pointer var?) and the "class entry" (?? no idea what this really means ??).
Well right, that's the special handling for vars that are 1) objects, and 2) have this "get_class_entry" object handler pointer set to something non-zero (e.g. it's not a NULL function pointer). If that is the case, however, there is some really nifty and mysterious crap going on to determine the actual "memory address" to use for the given object.
If that's not the case, though, then yes, it is that simple - you just use "(long) var" which is basically just the memory address of the variable (e.g. the value of the pointer passed in as the second parameter).
sneakyimp;11043671 wrote:At any rate, it seems clear enough to me that if the zval pointer var is pointing to an PHP object (as opposed to an string or array or scalar value) then it would construct a different p than it would construct otherwise. The particulars look pretty confusing, especially this line:
*(--p) = 'O';
Unless I'm mistaken, it is simply decrementing some memory location, p, to contain the letter O but the pointer manipulation is really confusing -- I'm wondering if p points to the beginning of a char array or the end?.
It will eventually point to the beginning of the char array populated with data... however, the smart_str_print_*() functions start filling the given buffer from the end (see smart_str_print_unsigned4() on line 122 of php_smart_str.h). So if you do the math, you'll (hopefully!) find that it will start on the right of id[32] and consume all but id[0] and be left pointing at the "beginning", which is id[1]. Thus, *(--p) is safely prepending a value at the id[0] location.
EDIT: It's late and I'm tired and sick (and any other excuse you can think of, though the previous three are actually facts), so I didn't check to see if it always consumed up till id[1]. Even if it didn't, the code is written in a way that would handle this. Basically, the smart_str_print_*() functions fill a buffer with a right-justified string of text. So if a buffer is length n, and the function uses m characters of the buffer, your string is located at [id[n-m+1]..id[n-1]] (the -1's get thrown in because C uses 0-indexed arrays and the rest of the world is for whatever reason defined "counting numbers" as whole numbers sans zero; what a silly world we live in).
sneakyimp;11043671 wrote:Also, the relation between id and p len is a mystery to me.
id[32] is an array of 32 chars allocated on the stack. (So, id itself is just a pointer to the beginning of this allocatoin.)
p is a pointer to a char array.
len is the length of the string used.
If the last one is unclear, grab a piece of paper and draw a box with 32 compartments. This is your memory allocated by id[32]. id points to the first bucket, sizeof(id) - 1 points to the last bucket, and p points to the last bucket that was used by smart_str_print_long() (which, if you recall from above, started from the last bucket).
sneakyimp;11043671 wrote:The variable stored is the address of var_no which just contains the (number of elements in var_hash)+1.
Not quite; note that the 4th parameter of the zend_hash_add() macro is a 'p' in front. Since it appears Hungarian notation was used, we can presume that the called function is expecting a pointer to the data, not the actual data it self. Thus, the data that's being stored is the data that can be found at the address of the var_no variable (which, more plainly said, is the value of var_no).
I walked the chain just to verify, and sure enough... you eventually land in - you guessed it, another macro! - INIT_DATA(), which does a memcpy() with pData as the second parameter (source memory address).
sneakyimp;11043671 wrote:Seems to me that a non-empty var_old variable MUST be supplied for this function to even bother searching var_hash for the supplied var value.
Yes, if you want to handle the case where the variable is already in the hash table, you'd supply that parameter and check for failure. Otherwise, the code will hit zend_hash_add() which is a macro (groaning yet again, are we?) for a function called zend_hash_addor_update/b (emphasis mine). Based on the name (I'll leave it to the reader as an exercise to verify - take that, you worthless math textbooks...), you can assume that the condition where a variable's address was already contained in the hash table will be handled gracefully (namely, by updating its previously stored meaningless(?) value of the number of hashes in the table with a new equally meaningless(?) computed value).