C function to reverse a double?

sneakyimp

I'm trying to write an extension to PHP which means coding in C. I'm really really rusty at C coding and was never very good at it.

Can anyone propose an efficient, safe, and [hopefully] future-proof way of reversing a double? Keep in mind that it should work on as many systems as possible and on 32- and 64-bit systems (and on ???-bit systems in the future?). Will the size of a 'double' ever change or will it always be 8 bytes?

I've tried this and it doesn't work...the compiler complains about "invalid operands to binary" because I'm trying bitwise shiftw on a non-integer.

    x = (x>>56) | 
        ((x<<40) & 0x00FF000000000000) |
        ((x<<24) & 0x0000FF0000000000) |
        ((x<<8)  & 0x000000FF00000000) |
        ((x>>8)  & 0x00000000FF000000) |
        ((x>>24) & 0x0000000000FF0000) |
        ((x>>40) & 0x000000000000FF00) |
        (x<<56);

Any advice would be much appreciated.

laserlight

What do you mean by "reverse a double"?

johanafm

Considering http://phpbuilder.com/board/showthread.php?t=10375067 I'd say LE -> BE / BE -> LE.

And shouldn't you be doing bitwise & before bit shift to get it right?

sneakyimp

Johanna has it right.

My function receives a double-precision value as input. Is it a long long? a floating point? a fixed point?

If the machine is Little-Endian then I want to reverse the byte order of the double because the AMF3 spec says all doubles are to be serialized in Network (big endian) byte order.

I see that the author of AMFEXT has done this:

static void amf0_write_number(amf_serialize_output buf, double num, amf_serialize_data_t * var_hash AMFTSRMLS_DC)
{
        union aligned {
                double dval;
                char cval[8];
        } d;
        const char * number = d.cval;
        d.dval = num;

     /*  AMF number: b(0) double(8 bytes big endian */
    if((var_hash->flags & AMF_BIGENDIAN) != 0)
    {
            char numberr[8] = {number[7],number[6],number[5],number[4],number[3],number[2],number[1],number[0]};
            amf_write_string(buf, numberr,8 AMFTSRMLS_CC);
    }
    else
    {
            amf_write_string(buf, number,8 AMFTSRMLS_CC);
    }
}

This code bothers me for a variety of reasons that I can share if you are curious. I see that he's casting the double as a char pointer to index it which I like, but I'm not really familiar with unions. I find myself wondering if this is a good way to do it. Is it fast? Is it going to over-write any adjacent bytes if a double doesn't happen to be 8 bytes long?

Sorry if this question seems noobish or redundant. The PHP source code is kind of impenetrable because of all the nested macros and weird stuff to ensure compatibility.

For example, I get a double when my main encoding function uses the PHP macro Z_TYPE_P (see line 435) which returns IS_DOUBLE (see line 564 when applied to the current input which is a zval struct. All function arguments passed from PHP into my PECL extension arrive as zval structs. I can extract the double from the zval object using Z_DVAL_P.

laserlight

sneakyimp wrote:
I see that he's casting the double as a char pointer to index it which I like, but I'm not really familiar with unions.

Not really: by accessing the array of char, one can access the bytes of the double, through the union. It is not a cast of a double to a const char*, although there is the same idea of accessing the bytes of the double.

sneakyimp wrote:
Is it going to over-write any adjacent bytes if a double doesn't happen to be 8 bytes long?

If you are worried about that, rewrite it to:

union aligned {
    double dval;
    char cval[sizeof(double)];
} d;

and:

char numberr[sizeof(double)];
size_t i;
for (i = 0; i < sizeof(double); ++i)
{
    numberr[i] = number[sizeof(double) - 1 - i];
}

sneakyimp

thanks so much laserlight. I haven't coded C in a very long time.

I'm still not really sure what the union bit is doing. I was originally imagining something like this:

static void fmog_write_double(smart_str *buf, double num TSRMLS_DC)
{
    char *a; // new char pointer var
    a = (char *)&num; // point char pointer to address of num

// loop...either reversing or not to write to buffer.

}

Using your suggestion I get something like this:

static void fmog_write_double(smart_str *buf, double num TSRMLS_DC)
{
    union aligned {
            double dval;
            char cval[sizeof(double)];
    } d;
    const char * number = d.cval;
    d.dval = num;

if (FMOG_G(endianness) == PHP_FMOG_ENDIAN_LITTLE) {
{
    // reverse the byte order from little endian to big endian
    char byte_arr[sizeof(double)];
    size_t i;
    for (i = 0; i < sizeof(double); ++i)
    {
        byte_arr[i] = number[sizeof(double) - 1 - i];
    }
    // append byte_arr to the buffer!
}
else
{
    // append number to the buffer!
}
}

Thoughts?

johanafm

sneakyimp;10960721 wrote:
I'm still not really sure what the union bit is doing.

union just shares the same memory. So the double and the char[8] are pointing to the same memory, and as such you have the ability to step through the value byte by byte through the char array.
The other way would be to use a void pointing to the double, and then char cast it (as I did in the code forum thread).

sneakyimp;10960721 wrote:
I was originally imagining something like this:
    a = (char *)&num; // point char pointer to address of num

It may just be that I don't remember how C works, but iirc you are not allowed to cast (double *) to (char *), which is why I used a void pointer.

sneakyimp;10960721 wrote:
Thoughts?

Perhaps it would be worthwhile to implement generic functionality when it comes to type handling. I.e. change the double val to void *data, add a third parameter to let calling code send along sizeof(type), and then you can use the same function for int, long, double, utf-8...

laserlight

johanafm wrote:
It may just be that I don't remember how C works, but iirc you are not allowed to cast (double ) to (char ), which is why I used a void pointer.

You don't remember how C works 🙂

sneakyimp

johanafm;10960778 wrote:
It may just be that I don't remember how C works, but iirc you are not allowed to cast (double ) to (char ), which is why I used a void pointer.

At the moment, I'm just sending a plain old double to the function. No pointer or anything. Thanks to your suggestions, this seems to be working:

static void fmog_write_double(smart_str *buf, double dbl TSRMLS_DC) /* {{{ */
{
    if (zend_isinf(dbl) || zend_isnan(dbl)) {
        php_error_docref(NULL TSRMLS_CC, E_ERROR, "double %.9g is not a valid double, serialization failed.", dbl);
        return;
    }
    if (FMOG_G(endianness) == PHP_FMOG_ENDIAN_LITTLE) {
        // the system is little endian so we must 
        // reverse the byte order from little endian
        //to big endian
        union aligned {
           double dval;
           char cval[sizeof(double)];
        } d;
        const char * number = d.cval;
        d.dval = dbl;

    char byte_arr[sizeof(double)];
    size_t i;
    for (i = 0; i < sizeof(double); ++i)
    {
        byte_arr[i] = number[sizeof(double) - 1 - i];
    }
    // append byte_arr to the buffer!
    smart_str_appendl(buf, byte_arr, (sizeof(double)));

} else {
    smart_str_appendl(buf, (char *)&dbl, (sizeof(double)));
}
} //fmog_write_double()

johanafm;10960778 wrote:
Perhaps it would be worthwhile to implement generic functionality when it comes to type handling. I.e. change the double val to void *data, add a third parameter to let calling code send along sizeof(type), and then you can use the same function for int, long, double, utf-8...

I totally completely agree that this needs to be standardized. There is some discussion on the PECL list to that effect here. The request as expressed asks for macros which I am not crazy about -- the PHP source is rotten with them and, without comments, they make it impossible to understand what is going on. Try and figure out which so-called smart string macro you use when you want to append the contents of a char pointer to your smart string buffer. The stuff practically looks like LISP and there are only two comments in there which explain very little.

However, macros generally don't require pushing any variables on the stack so are probably very speedy. If we are to do it as a macro, I'm guessing we'll need to declare all the variables involved first (e.g., src, dest). I don't really see a way to do a loop without declaring some var for looping.

I'm thinking it might have to be a function? Either that or a whole collection of macros -- one for each size?

johanafm

I've had a little look at the issue, and as Laserlight pointed out, I've no idea what I'm doing, so it's been the head against the wall each step of the way. And as far as any direct c-related stuff goes, assume I may be wrong...
Anyway, wether you go with macros or not is entirely up to you. You are of course certain macros will be replaced by the preprocessor, whereas you can't say for certain that a function will be inlined. But even if it isn't, will the overhead for this have a big enough impact to care?

But, since the only thing that matters for endian reversion is sizeof(type), you don't need to worry about types, just sizes. And it can easily be done. One example

const int endian_test = 1;
#define IS_BIGENDIAN() ((*(char *) &endian_test) == 0)
inline void reverse(char *data, const size_t s) {
	if (IS_BIGENDIAN()) {
		return;
	}

char t;
for (int i = 0, j = s-1; i < j; ++i, --j) {
	t = data[i];
	data[i] = data[j];
	data[j] = t;
}
}

long l; int i; double d;
reverse((char *) &l, sizeof(l));
reverse((char *) &i, sizeof(l));
reverse((char *) &d, sizeof(l));

The code does not know what needs to be reversed. It will for example happily reverse a utf-8, a struct or whatever you pass it, as long as you are on an LE system, and size is > 1.
You can of course replace the byte swapping with any fancy method of your choice, since you are operating on pairs of bytes there is no risk of overflow.

While this seems to be working, I do not understand why I get the output I do from the following code

	unsigned short s = 0xabcd;
	double d = 1.234;
	unsigned j	= 0x12345678;

printf("Initially: \n");
for (int i = 0; i < sizeof(s); ++i) {
	printf("%x ", *(((char *) &s) + i));
}
printf("\n");
for (int i = 0; i < sizeof(d); ++i) {
	printf("%x ", *(((char *) &d) + i));
}
printf("\n");
for (int i = 0; i < sizeof(j); ++i) {
	printf("%x ", *(((char *) &j) + i));
}
// reverse & print => reversion looks ok
// reverse again & print => same result as initially, so ok

Output

Initially: 
ffffffcd ffffffab 
58 39 ffffffb4 ffffffc8 76 ffffffbe fffffff3 3f 
78 56 34 12 

Reversed
ffffffab ffffffcd 
3f fffffff3 ffffffbe 76 ffffffc8 ffffffb4 39 58 
12 34 56 78

Where does those 3-byte FF-sequences come from?

sneakyimp

I'm delighted you've worked on this johana. I think the 3-byte sequences probably have something to do with how the system is using printf to interpret %x on (((char ) &s) + i).

To me, "(((char ) &s) + i)" looks a bit like LISP. Sometimes known as "Lots [of] Irritating Single Parentheses".

According to this page, %x interprets the argument as Unsigned hexadecimal integer. Since an integer is probably 4 bytes (32 bits) then it's interpreting each char as a full-blown int. I find it so puzzling that printf doesn't seem to have a format option for printing a hexadecimal char. Or binary output for that matter. When I test this sort of thing in PHP, I usually write it to a file and then use the 'xxd' command.

I see that your function takes a pointer to the item to be reversed. I'm wondering if we could perform the cast as char pointer in the function to reduce the verbosity of usage for this routine. Also, your sizeof bits when you call this function all appear to refer to sizeof(l) rather than the sizeof for the item to be reversed. Does this look ok?

const int endian_test = 1;
#define IS_BIGENDIAN() ((*(char *) &endian_test) == 0)
inline void reverse(void *arg, const size_t s) {
	if (IS_BIGENDIAN()) {
		return;
	}

char *data = arg;	
char t;
for (int i = 0, j = s-1; i < j; ++i, --j) {
	t = data[i];
	data[i] = data[j];
	data[j] = t;
}
}

long l; int i; double d;
reverse((char *) &l, sizeof(l));
reverse((char *) &i, sizeof(i));
reverse((char *) &d, sizeof(d));

I wonder what happens when one sends the wrong size param to the function -- like suppose we want to reverse a short int and send sizeof(double)? Seems like we would either reverse extra data (yikes!) or get a seg fault. Is there any way to check sizeof within the function?

johanafm

sneakyimp;10961197 wrote:
I'm delighted you've worked on this johana. I think the 3-byte sequences probably have something to do with how the system is using printf to interpret %x on (((char ) &s) + i).

My pleasure. It's been fun 🙂
Perhaps. Still, consider this

	double d = 1.234;
	printf("%f\n", d);		// 1.234
	char *c = (char *) &d;		// now we have a pointer to the first byte of d
	double e = (double) *c;	// derefence c, giving the first byte of d, convert this to double
	printf("%f\n", e);		// 88.0

// this on the other hand creates a double pointer that starts at the first byte of d
// but a double pointer points to an 8 byte memory area, which means that
// dereferencing this gives us the same double as we initially had
double f = *((double *) c);
printf("%f\n", f);		// 1.234
return 0;

sneakyimp;10961197 wrote:
To me, "(((char ) &s) + i)" looks a bit like LISP. Sometimes known as "Lots [of] Irritating Single Parentheses".

shiver LISP... never was that fond of it myself. If I knew the operator precedence in C, I could probably get rid of one set of parentheses though...
But, breaking it apart:
Get a char (byte sized) pointer to the address of s, then advance pointer by i bytes, and finally get the value of that byte.

sneakyimp;10961197 wrote:
Since an integer is probably 4 bytes (32 bits) then it's interpreting each char as a full-blown int.

Well, to me that would make the leading characters all 0, not F. And look at this

	double d = 1.234;
	char *c = (char *) &d;

char ch;
for (int i = 0; i < sizeof(d); ++i) {
	ch = *(c + i);			// we're really using a byte, not several.
	printf("%hx ", ch);
}
printf("\n");

printf("%hx %hx %hx %hx %hx %hx %hx %hx\n", 0x58, 0x39, 0xb4, 0xc8, 0x76, 0xbe, 0xf3, 0x3f);

I certainly must be missing something.

sneakyimp;10961197 wrote:
I see that your function takes a pointer to the item to be reversed. I'm wondering if we could perform the cast as char pointer in the function to reduce the verbosity of usage for this routine.

Seems like we would either reverse extra data (yikes!) or get a seg fault. Is there any way to check sizeof within the function?

In C your only universal option (as in one function for all types) for this, as far as I know (once again, that isn't very far when it comes to C), would be via a macro, since a macro can take any type, thus letting you call REVERSE(var) which calls reverse((char ) &var, sizeof(var)). The other way would be one function per type, which has the upside of keeping type safety. If C++ is an option, you could of course use template functions.

#define REVERSE(x) (reverse((char *) &(x), sizeof(x)))

If you cast to the correct pointer before the function is called, then you no longer know what type you were previously dealing with (inside the function). Sizeof within the function would report either 4 or 8 bytes for the char * pointer depending on 32/64 bits architecture, and 1 byte for the value it points to.

sneakyimp;10961197 wrote:
Also, your sizeof bits when you call this function all appear to refer to sizeof(l) rather than the sizeof for the item to be reversed. Does this look ok?

Copy-paste error. It should of course be as you say, except it really should be char , not void . You could use void as well, but then you'd have to cast to char to dereference it first => more parentheses. The reason for this is that void pointers can point to anything, and to dereference a pointer, the size of the data has to be known.
Edit: I just realized you did cast from void to char inside the function. Still, why not go with char* to begin with?