Code Share

Weedpacket · Mar 4, 2011

Just a quick note...

PHP's custom sort functions take a comparator function as callback (of two given elements, decide which is larger). Sometimes though it's easier to give each array element a score and then order by that (it's often somewhat faster too, since sorting with a comparator function means that each element basically gets examined twice - this way it only needs to be scored once).

Consider it a refinement of my earlier "Mutant Schwartzian transforms".

Key association is retained.

function sort_by(&$array, $scorefn)
{
	$vals = array_values($array);
	$array = array_map($scorefn, $vals);
	asort($array);

foreach($array as $key=>$val)
{
	$array[$key] = $vals[$key];
}
}

Weedpacket · Apr 7, 2012

I needed to have several independent (pseudo-)random number generators, where "independent" meant that the numbers yielded by one generator weren't affected by calls to any of the other generators (as would happen if they all ran off PHP's internal RNG).

I didn't need Mersenne Twister levels of randomness for my purpose, so just picked the first algorithm to hand; specifically, the RNG that appears in the GNU Scientific Library under the name of "zuf".

The generator (actually, the generator factory) is implemented as a single function. As usual, examples of its use follow.

function zuf($s = 0)
{
	$state_n = 0;
	$state_u = array();

$kl = 9373;
$ij = 1802;

if($s == 0)
{
	$s = 1802;
}
$ij = $s;
$i = intval($ij / 177) % 177 + 2;
$j = $ij % 177 + 2;
$k = intval($kl / 169) % 178 + 1;
$l = $kl % 169;
for($ii = 0; $ii < 607; ++$ii)
{
	$x = 0;
	$y = 0.5;
	for($jj = 1; $jj <= 24; ++$jj)
	{
		$m = $i * $j % 179 * $k % 179;
		$i = $j;
		$j = $k;
		$k = $m;
		$l = ($l * 53 + 1) % 169;
		if($l * $m % 64 >= 32)
		{
			$x += $y;
		}
		$y *= 0.5;
	}
	$state_u[$ii] = $x * (1 << 24);
}

return function()use(&$state_n, &$state_u)
{
	$n = $state_n;
	$m = ($n + (607 - 273)) % 607;
	$t = $state_u[$n] + $state_n[$m];
	while($t > 1 << 24)
	{
		$t -= 1 << 24;
	}
	$state_u[$n] = $t;
	if($n == 606)
	{
		$state_n = 0;
	}
	else
	{
		$state_n = $n + 1;
	}
	return $t;
};
}

// Four different generators, using two different seeds
// (to illustrate independence).

$rng1 = zuf(0);
$rng2 = zuf(42);
$rng3 = zuf(0);
$rng4 = zuf(42);


for($i = 0; $i < 10; ++$i)
{
	echo $rng1(), ' ';
}
echo "\n";
for($i = 0; $i < 10; ++$i)
{
	echo $rng3(), ' ';
}
echo "\n";
for($i = 0; $i < 10; ++$i)
{
	echo $rng2(), ' ';
}
echo "\n";
for($i = 0; $i < 10; ++$i)
{
	echo $rng1(), ' ';
}
echo "\n";
for($i = 0; $i < 10; ++$i)
{
	echo $rng3(), ' ';
}
echo "\n";
for($i = 0; $i < 10; ++$i)
{
	echo $rng2(), ' ';
}
echo "\n";
for($i = 0; $i < 20; ++$i)
{
	echo $rng4(), ' ';
}
echo "\n";

[Edit years later: PHP 8.2 has a revamped PRNG system available that allows multiple independent instances of multiple RNGs. So, basically, it does what the code here was for.]

Weedpacket · Mar 16, 2014

Like most real-world libraries, PHP uses Quicksort for its sorting algorithm. Done right, this is a nice and efficient algorithm for general-purpose use (in specific circumstances other algorithms are better choices).

If Quicksort has one big drawback, it's the fact that it's not stable. If two elements in the input array are considered "equal" for the purposes of comparison, the order in which they appear in the output is undefined. What makes this a drawback is that it interferes with constructing complex sort criteria (you can't break the task down into "sort by criterion A" followed by "sort by criterion B" because the latter sort messes up the results of the former): you have to sort once and check all the criteria during that sort, instead of being able to "chain" sort criteria together.

When I was faced with this problem I started out implementing Mergesort (which is stable and its worst-case performance is comparable to Quicksort's average-case performance), but then I realised I could stabilise Quicksort instead. I wrote this as a drop-in replacement for usort; comparable "ssort", "srsort", "sasort", etc. can be written using the same model.

function susort(&$array, $cmp)
{
	$array = array_merge(null, range(1, count($array)), $array);
	usort($array, function($a, $b)use($cmp)
	{
		return $cmp($a[1], $b[1]) ?: ($a[0] - $b[0]);
	});
	$array = array_column($array, 1);
}

[Edit quite a while later: PHP's own sorting is stable as of v8.0, so you don't need to do it yourself any more.]

Weedpacket · Apr 29, 2015

Just a quick thing: Leap years in the Gregorian calendar without any branch instructions.

function is_leap_year($year)
{
	return !($year % (400 - 396 * ($year % 100 != 0)));
}

Bonesnap · May 4, 2015

Weedpacket;11047287 wrote:
Just a quick thing: Leap years in the Gregorian calendar without any branch instructions.
function is_leap_year($year)
{
	return !($year % (400 - 396 * ($year % 100 != 0)));
}

Couldn't you just use PHP's date() function for that?

return date('L', $year);//Returns 1 for leap year or 0 if not

NogDog · May 4, 2015

Little something I found on the interwebs using unpack to parse fixed-length records in a file:

<?php
// number after "A" is the field length, the text that follows becomes the array key:
$formatStr = 'A3old_prov_plan_cd/A4old_pcp_network_id/A13old_prov_num/A2old_pcp_number_sfx/A2old_seq_num/' .
	'A3new_prov_plan_cd/A4new_pcp_network_id/A13new_prov_num/A2new_pcp_number_sfx/A2new_seq_num';

while (($line = fgets($fh)) !== false) {
	$line = trim($line);
	if ($line === '') {
		continue;
	}
	// here's where the magic happens:
	$fields = unpack($formatStr, $line);

// in this case, I didn't actually need the array keys....
$oldToken = implode('.', array_slice($fields, 0, 5));
$newToken = implode('.', array_slice($fields, 5, 5));
$stmt->bindParam(':old_token', $oldToken);
$stmt->bindParam(':new_token', $newToken);
if ($stmt->execute() == false) {
	throw new Exception(print_r($stmt->errorInfo(), 1) . PHP_EOL . $sql);
}
}

Weedpacket · May 4, 2015

Bonesnap;11047375 wrote:
Couldn't you just use PHP's date() function for that?
return date('L', $year);//Returns 1 for leap year or 0 if not

You mean

return date('L', mktime(1, 1, 1, 1, 1, $year));

And for other calendars (which I was having to work with) the leap year calculation might be less or more complicated.

dalecosp · May 8, 2015

At Weedpacket's urging, courtesy of Oleg Englishman at bigmir dot net and the PHP Manual Comments:

Adding to an array while iterating

$array = range(1,10); 

while (list($arr,$val) = each( $array )) { 

   if ($val == 5) { 
      $array[] = 11; 
   } 
   echo $val.PHP_EOL; 
}

Some discussion and alternate implementations are [thread=10393453]on this thread[/thread].

NogDog · Apr 22, 2016

A little something I came up with at work today (based on something I found on stackoverflow) to record DB query response times when using PDO.

<?php

class PDOTimer extends PDO {
    function __construct($dsn, $username="", $password="", $driver_options=array()) {
        parent::__construct($dsn,$username,$password, $driver_options);
        $this->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('PDOStatementTimer', array($this)));
    }
}

class PDOStatementTimer extends PDOStatement {
    public $dbh;

protected function __construct($dbh) {
    $this->dbh = $dbh;
}

public function execute($bound_input_params = NULL)
{
    $start = microtime(true);
    $result = parent::execute($bound_input_params); // TODO: Change the autogenerated stub
    $time = microtime(true) - $start;
    $trace = debug_backtrace();
    $caller = $trace[1];
    $source = "{$caller['file']}[{$caller['line']}]: {$caller['class']}::{$caller['function']}";
    error_log("PDO Execute took $time seconds".PHP_EOL."Called by $source".PHP_EOL.$this->queryString);
    return $result;
}
}

We use prepared statements for everything, but if you use PDO::query(), you could do the same override of it, too, I suppose. Usage is simply including the class definitions and then instantiating PDOTimer in place of PDO. If it's possible procedural code might hit it, then you'd probably need to add a check that $caller['class'] and/or $caller['function'] are not empty before trying to use them.

Weedpacket · Dec 21, 2016

So you've got a list of items that you want to draw from at random (with replacement). But you don't want a uniform distribution where each item is equally likely. You want weighted probabilities where, say, "Silver" is to be drawn five times for every three times that "Gold" is drawn, and "Bronze" should come up something like eleven times in the same number of draws.

What you could do is build a list with duplicates (three Golds, five Silvers, and eleven Bronzes) — 19 entries in all — and draw uniformly from that. But that does seem a bit wasteful, especially if the list is in a database. Or consider the case where there should be one Gold for every hundred Silver, and one Silver for every hundred Bronze. That list of duplicates will have 10,101 entries — for only three different values.

Instead, what we'll do is use conditional probabilities. First of all, we decide whether or not to pick the first element ("Silver" gets picked with probability 5/19). If we decide to pick the first element, then we're done. If not, we randomly decide whether to pick the second element. But to work out the probability we have to take into account the fact that we have decided not to pick the first. ("Gold" gets picked, not with probability 3/19 ≈ 15.79%, but with probability 3/(19-5) = 3/14 ≈ 21.43%). If we decide not to pick any other elements before the last one, then by the time we get to it we have no choice in the matter any more - we have to pick the last element because it's the only one left to pick. ("Bronze" gets picked with probability 11/(19-5-3) = 11/11 = 1 = 100%). The conditional_probabilities function calculates, for each element, the probability to use when deciding whether or not to pick that element given that we have not picked any of the previous ones.

Actually picking an element is done by the get_variate function, which works as described above: for each element of the array, decide at random whether or not to pick it based on the conditional probabilities calculated earlier, and keep going until one is picked.

There's not much to using the two functions; the example pretty much covers the interface (I mean, two functions with one parameter each, how much interface is there to cover?). As you can see, the input to conditional_probabilities is an array where the values to be selected are array keys, and their relative frequencies the corresponding values (["Silver" => 5, "Gold" => 3, "Bronze" => 11]); get_variate returns one of those keys.

function conditional_probabilities($weights)
{
	$probs = $weights;
	foreach($probs as &$weight)
	{
		$weight/= array_sum($weights);
		array_shift($weights);
	}
	return $probs;
}

function get_variate($probs)
{
	foreach($probs as $k => $p)
	{
		if(mt_rand() <= mt_getrandmax() * $p)
		{
			return $k;
		}
	}
}

// Example of use

$biases = [
	'foo' => 1,
	'bar' => 7,
	'baz' => 3,
	'blug' => 5,
	'wibble' => 9,
	'womble' => 4,
	'splunge' => 7,
	'plink' => 2,
	'plunk' => 9,
	'plonk' => 4,
	'fnord' => 0,
	'zit' => 3,
	'zot' => 7];
$probs = conditional_probabilities($biases);

$sample = [];
for($j = 0; $j < 100000; ++$j)
	$sample[] = get_variate($probs);

// Let's see if these variates have the distribution they should:
$histogram = array_count_values($sample);
$sum = array_sum($biases) / 100000;
foreach($histogram as $key => $p)
{
	echo $key,"\t", $p * $sum, "\n";
}

baz     2.9646
plonk   4.07846
womble  4.03393
splunge 6.95644
zot     6.94607
blug    4.93429
plunk   8.85842
wibble  9.06277
zit     3.01645
bar     7.08149
foo     1.02968
plink   2.0374

Clearly it's not going to preserve the order of the input weights; they're actually listed in the order they were first picked. So I'll just sort them by hand here, along with the intended weight for comparison, and also the totals for the two columns:

foo     1.02968   1
bar     7.08149   7
baz     2.9646    3
blug    4.93429   5
wibble  9.06277   9
womble  4.03393   4
splunge 6.95644   7
plink   2.0374    2
plunk   8.85842   9
plonk   4.07846   4
zit     3.01645   3
zot     6.94607   7
       61.00000  61

It should go without saying that since fnord has zero chance of getting picked, it wasn't and so didn't show up to be counted.

dalecosp · Sep 18, 2017

Weedpacket;10567474 wrote:

Essentially:

$code = file_get_contents('file_to_test.php');
$code = str_replace("'", "'\"'\"'",$code);
$check = `echo '$code' | php -l`;
echo $check;

Wow. 13 years ago. And mine, years later, is more huger. :rolleyes: I think I'll start the submission process for a vacation request Real Soon Now(tm). :queasy:

This is "/usr/local/bin/lint" on all my servers that don't already have lint(1) (Some # of Tuxens? CentOS, for sure ... ), and "plint" on the ones that do (like FreeBSD).

#!/usr/bin/env php
<?php

//"lint" ... group linter for PHP, dalecosp 1/11/2016 0.1

if (!strlen($argv[1])) help();

$wd = pathinfo($argv[1]);

$wd = $wd[dirname] . "/" . $wd[basename];

$list = glob("$wd/*");

foreach ($list as $file) {

   if (substr( $file, -4 ) == ".php" ) {
      system( "php -l $file" );
   } else {
      $file_type = `/usr/bin/file $file`;
      if (stristr( $file_type, "PHP" )) {
         system( "php -l $file" );
      }
   }
}

function help() {

   echo "No argument given; call 'lint \$PWD' for batch PHP syntax checking.";
   exit;

} //EOF

Weedpacket · Oct 31, 2017

Permutations!
A couple of generators: pass one an array and it have it successively crank out different permutations of the array's elements. ("Different", that is, if all of the input array's elements are distinct.)

One yields the permutations in lexicographic ("dictionary") order. Think of the array as a word written in some weird alphabet (its elements being the letters), and taking the order of letters in that word to be "alphabetical" (an English example of such a word would be "begins") and then listing all the rearrangements of those letters in alphabetical order ("begisn", "begnis", "begnsi", ... "singbe", ... "snigeb").

a b c
a c b
b a c
b c a
c a b
c b a

The other is what bell-ringers call "plain change" order: it's a minimum-change (meaning only two elements change positions each time) permutation that always swaps two adjacent elements. It's implemented recursively here, by weaving the nth element back and forth through the elements of the previous permutations.

a b c
b a c
b c a
c b a
c a b
a c b

So, here's the code.

function lexicographic_permutations($l)
{
	$n = count($l);
	if($n == 0 || $n == 1)
	{
		yield $l;
	}
	elseif($n == 2)
	{
		yield $l;
		yield array_reverse($l);
	}
	else
	{
		for($i = 0; $i < $n; ++$i)
		{
			$new = $l;
			$li = $new[$i];
			unset($new[$i]);
			foreach(lexicographic_permutations(array_values($new)) as $v)
			{
				array_unshift($v, $li);
				yield $v;
			}
		}
	}
}

function plain_changes_permutations($l)
{
	$n = count($l);
	if($n == 0)
	{
		yield $l;
	}
	else
	{
		$first = [array_shift($l)];
		--$n;
		$ascending = 1;
		foreach(plain_changes_permutations($l) as $subperm)
		{
			$to = $n * $ascending;
			$from = $n - $to;
			foreach(range($from, $to) as $i)
			{
				$perm = $subperm;
				array_splice($perm, $i, 0, $first);
				yield $perm;
			}
			$ascending = 1 - $ascending;
		}
	}
}

NogDog · Nov 7, 2017

Made a small extension to the PDO class to help DRY up my use of prepared statements: https://github.com/nogdog/pdoquery

The actual PHP code is simply:

<?php

/**
 * Add a one-stop method for prepared query execution
 */
class PDOQuery extends PDO
{
    /**
     * Prepare and execute a query
     *
     * @param string $sql
     * @param Array $data  array(':place_holder_1' => 'value_1'[,...])
     * @return PDOStatement
     */
    public function preparedQuery($sql, Array $data=array())
    {
        $stmt = $this->prepare($sql);
        if($stmt == false)
        {
            throw new Exception('Prepare failed:'.PHP_EOL.print_r($this-errorInfo()));
        }
        if(($result = $stmt->execute($data)) == false)
        {
            throw new Exception('Execute failed:'.PHP_EOL.print_r($stmt->errorInfo()));
        }
        return $stmt;
    }
}

Weedpacket · Dec 8, 2017

This is more of a sketch of an idea. That idea is that "It would be nice to be able to call a method on an array of objects and have that method call mapped over all of the objects in the array." Like I have an array $clients and I want to call $client->getMessages('INFO', $channel) on each one and collect the results into a new array.

Writing it directly gives something like

array_map(function($client)use($channel)
{
	return $client->getMessages('INFO', $channel);
}, $clients);

But that feels a bit messy for something I'd probably do a fair bit of. Notice the repetition of $client and $channel.

For now I'm trying out this:

function map_method($array)
{
	return new class($array)
	{
		private $array;
		public function __construct($array)
		{
			$this->array = $array;
		}
		public function __call($name, $arguments)
		{
			return array_map(function($element)use($name, $arguments)
			{
				return $element->{$name}(...$arguments);
			}, $this->array);
		}
	};
}

Which lets me replace that array_map above with map_method($clients)->getMessages('INFO', $channel);

[Edit later...] Oh, hey, PHP 7.4 introduced arrow functions. They allow streamlining anonymous functions with bodies that consist of a single return statement (which is quite common for callbacks):

public function __call($name, $arguments)
{
	return array_map(fn($element) => $element->{$name}(...$arguments), $this->array);
}

Weedpacket · Dec 17, 2017

Okay, I just needed to flatten out an arbitrarily-nested array. Adding the limit condition was a last-minute decoration.


function flatten_array($array, $limit = PHP_INT_MAX)
{
	$recurse = function($array, $limit)use(&$recurse)
	{
		if($limit >= 0)
		{
			foreach($array as $a)
			{
				if(is_array($a))
				{
					yield from $recurse($a, $limit - 1);
				}
				else
				{
					yield $a;
				}
			}
		}
		else
		{
			yield $array;
		}
	};
	return iterator_to_array($recurse($array, $limit), false);
}

$array = [[1,2,3],[4,[5],6],7,8,[[[[[[[9]]],10]]]]];

$f = flatten_array($array, 2);
var_export($f);

Weedpacket · Mar 8, 2018

Binomial choose k elements from a set of n.

function subsets($needed, $set)
{
	if($needed == 0)
	{
		yield [];
	}
	else
	{
		--$needed;
		$n = count($set);
		for($i = 0; $i < $n; ++$i)
		{
			$element = array_pop($set);
			foreach(subsets($needed, $set) as $subset)
			{
				$subset[] = $element;
				yield $subset;
			}
		}
	}
}


$k = 3;
$n = str_split('abcdefg');
foreach(subsets($k, $n) as $subset)
{
	echo join($subset) . ' '; // 7!/(3!4!) = 35 different subsets.
}

Weedpacket · Mar 10, 2021

Well, this has been quiet for a while. So I'm going to share a routine I knocked together to emulate sliced array views. The fun of array elements that contain variable references. I've pulled this stunt before....

function slice(array &$input, int $start = 0, int $end = PHP_INT_MAX, int $step = 1): array
{
	$start = max($start, 0);
	$end = min($end, count($input));
	$step = max($step, 1);

$return = [];
for($i = $start; $i < $end; $i += $step)
{
	$return[] = &$input[$i];
}
return $return;
}


// Examples

$array = range(0,20);

echo "\n\n";
echo 'Original array: ', join(' ', $array), "\n";


$pick = slice($array, 2, 12, 3);

echo 'Picking every third item starting from 2 up to 12: ', join(' ', $pick), "\n";

$pick[3] = 'foo!';

echo 'Changed one item: ', join(' ', $pick), "\n";

echo 'Original array now looks like: ', join(' ', $array), "\n";

echo "\n\n";

$again = slice($array, start: 4, step: 5); // Note: PHPv8 named arguments because they're nice.

echo 'Every fifth item starting from 4 and continuing to the end: ', join(' ', $again), "\n";

foreach($again as &$poke)
{
	$poke = 'wibble';
}
unset($poke);

echo 'Answer every question the same way: ', join(' ', $again), "\n";

echo 'Making the original array: ', join(' ', $array), "\n";

// And variable references are symmetric, so
$array[19] = 'Yeah';
echo join(' ', $again), " you can go the other way as well.\n\n";
// These views are live.


// And just a little more elaborate...

$stride = 9;
$rows = array_fill(0, $stride**2, 0);

$columns = [];
for($i = 0; $i < $stride; ++$i)
{
	$columns[] = slice($rows, $i, step: 9); // Variable references survive this sort of treatment.
}
$columns = array_merge(...$columns); // And this sort too; we're not actually touching the array elements themselves

for($i = 0; $i < $stride**2; ++$i)
{
	$columns[$i] = $i;
}

echo join(' ', $rows);

/*
The same result can be achieved with
$stride = 9;
$rows = array_merge(...array_map(null, ...array_chunk(range(0, $stride**2-1), $stride)));
but that would be missing the point.
*/
?>

Weedpacket · Jul 20, 2021

Okay, here's a wee thing.

$this->source->tap() returns arrays [bool, mixed] with the first part showing whether the second part is cromulent; in this particular instance the second part is irrelevant if it is not so. I want to keep tapping the source for those second part items while that first part remains true.

		while(([, $item] = $this->source->tap())[0]) {...}

See, the assignment expression evaluates to the whole array, with both 0 and 1 elements. The left hand side of the assignment binds $item to the value of the 1 element and ignores the 0 element. But since the assignment operation itself evaluates to the entire array, the 0 element is happily sitting there to be addressed to see if it is true or not.

Here, have a look at the opcode instructions, (as generated by php -d opcache.opt_debug_level = 0x10000 when you have opcache enabled):

...
0005 T2 = FETCH_OBJ_R THIS string("source")
0006 INIT_METHOD_CALL 0 T2 string("tap")
0007 V3 = DO_FCALL
0008 V4 = FETCH_LIST_R V3 int(1)
0009 ASSIGN CV0($item) V4
0010 T6 = FETCH_DIM_R V3 int(0)
0011 JMPNZ T6 0003

Instructions 5-6 retrieve the source property from $this, and the tap method from that, using the register T2 to store the property along the way.
Instruction 7 makes the call to tap and stuffs the result into register V3. That's the two-element [bool, mixed] array.
Instruction 8 fetches element 1 from what's in register V3 (for the purposes of a list() assignment) and stuffs it into register V4.
Instruction 9 promptly copies what's in V4 into local variable 0 (known to you and me as $item).
Instruction 10 now fetches element 0 from the [bool, mixed] array still sitting in register V3 and stuffs it into register T6.
Finally, instruction 11 decides whether or not to jump back to instruction 3 (not shown) depending on whether or not the content of register T6 is nonzero or not. For whatever reason, while loops are written in bytecode with their controlling tests at the end of the loop body (indicated in the bytecode and source snippet by ...).

NogDog · Aug 2, 2021

Don't know if I've ever used array_diff() before, but came up with this use case today where it seems to do the trick.

I have a set of tests that use JSON files for the test definitions. In them I have an optional field for each test called "identities"* that can either be a simple string (which then is used to get the array of identities under that key from another JSON file), or it can be an explicit array, e.g. "identities": ["id_1", "id_5"].

I decided I wanted a way to say, "Use this set, except for these identities which I want you to skip." I then added the ability to use this syntax:

"identities": "foo except [\"id_1\", \"id_5\"]"

To handle that in the script that interpolates and executes each test definition:

      elseif(preg_match('/^(\w+) except (\[.+\])$/', $test['identities'], $matches)) {
        $exceptions = json_decode($matches[2], true);
        $ids = array_diff($identities[$matches[1]], $exceptions);
        if(empty($ids) or !is_array($ids)) {
          die("Failed to populate 'except' identities\n");
        }
        $api->identities($ids);
      }

__________
* "identities" are a specific thing in the app -- don't worry what it is, just know that it's a string here.

PS: I may change it so that the exceptions are an array in the identities file, and then just do:

"identities": "foo except bar"