Improve handling of postgres array result?

NogDog · May 16, 2018

While fixing a bug in a legacy app, I ended up leveraging a PostgreSQL array in a select query, which ends up with each result row of the fetched array being something like:

[0] => array(
  'foo' => '{12345,"some text","other text"}',
  'bar' => 'different field',
  'fubar' => 'yet another field'
)

To get each of the 3 parts of the foo element into their own fields to make it easy for the view to display it, I did something like:

foreach($result as $ix => $data) {
  if(!empty($data['foo'])) {
    $fields = json_decode('['.trim($data['foo'], '{}').']', true);
    $result[$ix]['fld1'] = $fields[0];
    $result[$ix]['fld2'] = $fields[1];
    $result[$ix]['fld3'] = $fields[2];
  }
  else {
    $result[$ix]['fld1'] = '';
    $result[$ix]['fld2'] = '';
    $result[$ix]['fld3'] = '';
  }
}

I'm pretty sure there must be a cleaner way to do that using array_map() or such, plus the json_decode(...) bit seems pretty kludgy, so just wondered if anyone has any suggestions for better ways, in terms of efficiency and maintainability? (It works fine, and it's not a high-demand app requiring lightning response time...but I'm not happy with it as it is. )

NogDog · May 16, 2018

Refactored it a bit leveraging str_getcsv() and array_walk()...

		$records = self::breakOutProvFields($records);
		return $records;
	}

	private static function breakOutProvFields(array $records)
	{
		foreach($records as $type => $data) {
			array_walk($records[$type], array('self', 'bopfCallback'));
		}
		return $records;
	}

	private static function bopfCallback(&$fields, $ix)
	{
		if(!empty($fields['provider_array'])) {
			$parts = str_getcsv(trim($fields['provider_array'], '{}'));
			$fields['npi'] = $parts[0];
			$fields['provider_name'] = $parts[1];
			$fields['provider_address'] = $parts[2];
		}
		else {
			$fields['npi'] = $fields['provider_name'] = $fields['provider_address'] = '';
		}
	}

Why static methods, you ask? Because that's what the rest of the class uses.

dalecosp · May 16, 2018

NogDog Because that's what the rest of the class uses.

Yeah, got bit by something similar not too long ago. In a payment module, no less.

Weedpacket · May 16, 2018

When I see something like

$parts = str_getcsv(trim($fields['provider_array'], '{}'));
$fields['npi'] = $parts[0];
$fields['provider_name'] = $parts[1];
$fields['provider_address'] = $parts[2];

I think "list".

But since this is a PostgreSQL array type, could you move that destructuring step into the query itself?

SELECT ... COALESCE(provider_array[1], '') AS npi ...

If an array element or the entire array is absent (provider_array IS NULL) then trying to access that element evaluates as NULL rather than being an error.

Weedpacket · May 16, 2018

I'm seeing an extra level of indexing in that array...? breakOutProvFields has a nested loop (one explicit, one implicit in the array_walk) and the elements being passed to the callback are themselves arrays?

I would have thought applying the callback to the elements of $records would have been enough without having to put it in a loop as well.

	return array_map(['self', 'breakOutProvFields'], $records);
}

private static function breakOutProvFields(array $record)
{
	if(!empty($record['provider_array'])) {
		[$record['npi'],
		 $record['provider_name'],
		 $record['provider_address']] = str_getcsv(substr($record['provider_array'], 1, -1));
		// unset($record['provider_array']);
	}
	else {
		$record['npi'] = $record['provider_name'] = $record['provider_address'] = '';
	}
	return $record;
}

NogDog · May 16, 2018

Weedpacket

Yeah, sorry for the confusion: the "model" actually does up to 3 separate queries (each with the same set of columns, just different grouping/ordering criteria), and puts their result set arrays into that containing (probably poorly named) $records array.

I did start thinking/wondering about tackling the fields separation in the SQL, but the danged query is so tangled now that I was afraid to touch it. (What, you think that's a code smell? Nah... )

Weedpacket · May 18, 2018

I understand; once you've got a Big Ball of Mud of an application it becomes practically impossible to isolate any one part that continues to make sense without context – context that can only be supplied by dragging the whole rest of the application back into frame.