It occurred to me that perhaps some of my confusion here may be related to notation -- we use square brackets in PHP to denote arrays but in octave we use them to denote matrices or vectors. Code that does one thing in PHP might do something entirely different in octave. For example, this results in some 1x3 matrices being concatenated into a 1x9 matrix:
x1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
x1 =
1 2 3 4 5 6 7 8 9
in octave you delimit rows with semicolons:
x1 = [[1, 2, 3]; [4, 5, 6]; [7, 8, 9]]
x1 =
1 2 3
4 5 6
7 8 9
And the treatment of brackets, commas, and semicolons suggests to me that there's no concept of any matrix/tensor/object with a rank greater than 2:
x1 = [[1; 2; 3]; [4; 5; 6]; [7; 8; 9]]
x1 =
1
2
3
4
5
6
7
8
9
Weedpacket Makes sense (or, at least, is consistent). It's a vector; it only has one dimension, while the transpose needs at least two dimensions to work with (so at least a matrix), seeing as it's expected to swap them.
I have a dim inkling that this is perhaps too clever by half to say that there's no such thing as a matrix with a singleton dimension. Or perhaps there is a distinction between a vector with 4 elements and a 1x4 matrix or 4x1 matrix. Octave certainly seem to make the distinction, and complains when dimensions don't match.
Regarding performance, I have some very disappointing news for these PHP libs. I have a data object distilled from a corpus of ham & spam messages. The training set, Xtrain, has 1375 elements, which of which represents one email message in my corpus. Each element is an array with 1966 elements, each of which is a zero or one indicating whether a particular word (from a vocabulary of 1966 words) is present in that particular email message. I munged my data corpus with some other scripts and exported this 2-D array into a json file, which can be loaded thusly:
$data_file = '../../../machine-learning/training_data_sets.json';
$data = json_decode(file_get_contents($data_file), TRUE);
$keys = array_keys($data); // Xtrain, ytrain, Xval, yval, Xtest, ytest
echo "Xtrain has " . sizeof($data['Xtrain']) . " elements\n"; // 1375 elements
echo "Xtrain[0] has " . sizeof($data['Xtrain'][0]) . " elements\n"; // 1966 elements
One of the most time-consuming calculations in the svm training script is to take this Xtrain corpus and multiply it by its transpose. In octave, this happens almost instantly with an older, slightly different corpus:
size(X)
ans =
1382 1996
% this returns insantly
K = X * X';
I've tried this using all 3 of the matrix libs I've stumbled across, and they run MUCH slower
NumPHP:
$X = new NumArray($data['Xtrain']);
$start = microtime(TRUE);
$XT = $X->getTranspose();
echo "getTranspose in " . (microtime(TRUE) - $start) . " seconds\n"; // 5.9611051082611 seconds
$start = microtime(TRUE);
$K = $X->dot($XT);
echo "dot product in " . (microtime(TRUE) - $start) . " seconds\n"; // 466.75154590607 seconds
MarkBaker:
$X = new Matrix\Matrix($data['Xtrain']);
$start = microtime(TRUE);
$XT = $X->transpose();
echo "transpose in " . (microtime(TRUE) - $start) . " seconds\n"; // 0.13952708244324 seconds
$start = microtime(TRUE);
$K = $X->multiply($XT);
echo "multiply in " . (microtime(TRUE) - $start) . " seconds\n"; // 1821.0363881588 seconds
PHP-ML:
$X = new Matrix($data['Xtrain']);
$start = microtime(TRUE);
$XT = $X->transpose();
echo "transpose in " . (microtime(TRUE) - $start) . " seconds\n"; // 0.14473080635071 seconds
$start = microtime(TRUE);
$K = $X->multiply($XT);
echo "multiply in " . (microtime(TRUE) - $start) . " seconds\n"; // 292.54629206657 seconds
I've yet to investigate whether these short PHP scripts are calculating the correct multiplication, but the performance seems really slow, especially given the speed at which octave seems to perform the calculation. Notational confusion aside, I am now questioning whether these libs are suitable at all for this application.