PHP - HTTP DIGEST AUTHENTICATION - Understanding the Code

dudelondon

Hi Everyone,

This is my first time on your website and please excuse if I am asking silly question.

I have a question about PHP HTTP DIGEST AUTHENTICATION - published in PHP Manual - php.net
Their explanation is bit limited and I was unable to find anything about it on the web - People have written that
it's beyond the knowledge of the language and just paste the code and use it. But I have still done some study on the regular expression used and managed to understand it but please help me in understanding it completely.

The code which I need to be explained is in Blue color.

I will thank u in advance 🙂

/////////////////////////////////////////////////////////////////////////////////////////////////////////////
<?php
$realm = 'Restricted area';

//user => password
$users = array('admin' => 'mypass', 'guest' => 'guest');

if (empty($_SERVER['PHP_AUTH_DIGEST'])) {
header('HTTP/1.1 401 Unauthorized');
header('WWW-Authenticate: Digest realm="'.$realm.
'",qop="auth",nonce="'.uniqid().'",opaque="'.md5($realm).'"');

die('Text to send if user hits Cancel button');

}

// analyze the PHP_AUTH_DIGEST variable
if (!($data = http_digest_parse($_SERVER['PHP_AUTH_DIGEST'])) ||
!isset($users[$data['username']]))
die('Wrong Credentials!');

// generate the valid response
$A1 = md5($data['username'] . ':' . $realm . ':' . $users[$data['username']]);
$A2 = md5($_SERVER['REQUEST_METHOD'].':'.$data['uri']);
$valid_response = md5($A1.':'.$data['nonce'].':'.$data['nc'].':'.$data['cnonce'].':'.$data['qop'].':'.$A2);

if ($data['response'] != $valid_response)
die('Wrong Credentials!');

// ok, valid username & password
echo 'Your are logged in as: ' . $data['username'];

// function to parse the http auth header
function http_digest_parse($txt)
{
// protect against missing data
$needed_parts = array('nonce'=>1, 'nc'=>1, 'cnonce'=>1, 'qop'=>1, 'username'=>1, 'uri'=>1, 'response'=>1);
$data = array();
$keys = implode('|', array_keys($needed_parts));

print $txt;

$txt - Below code shows the values which I received from the server - To check whats coming from the server
username="guest",realm="Restricted area",nonce="4b20d54ab440a",uri="/http.php",cnonce="e6fd095f85a80f1e68f3c2685119b35c",nc=00000001,response="ebaa40b07e3da56e89b048a9766fd4db",qop="auth",opaque="cdce8a5c95a1427d74df7acbf41c9ce0"Your are logged in as: guest

preg_match_all('@(' . $keys . ')=(?🙁[\' "])([^{\2]+?)\2|([^\s,]+))@',} $txt, $matches, PREG_SET_ORDER);

My Understanding of the preg match all code above -

Preg_match_all is used to capture strings which have been matched by the above pattern. @ - means don't report any error on this. ' .$keys. ' have been imploded with a (pipe '|' also called or ) and because of implode function usage the keys values are 'nonce|nc|cnonce|qop|username|uri|username' .

Now the pattern looks for = sign .

Than parentheses comes with ?: non capturing parentheses means that it can be captured but it cannot be counted - at the time of using backrefrence.

Now 2 parentheses comes along ([\' "]) with a character class and \' (not very sure about this) -Please confirm if correct - but i think it's escaped so that
we can capture ' or " -as you can see from the above $txt variable username has a value "guest".

Now we get to the third parentheses which is ([^\2])+ - which I think is using negation with a back refrence , so we go back to the 2 parentheses because of \2 and look for another
" not start with what was matched in 2 parentheses with +(one or more) and than another parentheses with )?(option sign in the end to tell if its not really needed but optional) and another backrefrence
\2 which takes back to ([\' "]) and says find " and yes it "guest" is found without the quotes and is saved in the matches.

Now I am confused at this very much | (represented as "pipe" or "or") I think its to do something with the keys which were imploded earlier - look at the $key variable after imploding - Whats the use of this don't know and in what context it's being used don't know. And the last parentheses ([^\s,]+) says dont capture anthing thats whitespace and , with + sign (One or more)

Please can someone tell whether I have got the above understanding correct and what mistakes I have made in my understanding.

[color=#0000FF]foreach ($matches as $m) {
    $data[$m[1]] = $m[3] ? $m[3] : $m[4];
    unset($needed_parts[$m[1]]);
}

return $needed_parts ? false : $data;[/color]}

The above blue code explanation will also be much appreciated.

?>
/////////////////////////////////////////////////////////////////////////////////////////////

johanafm

dudelondon;10936260 wrote:
The code which I need to be explained is in Blue color.
{
// protect against missing data
$needed_parts = array('nonce'=>1, 'nc'=>1, 'cnonce'=>1, 'qop'=>1, 'username'=>1, 'uri'=>1, 'response'=>1);
$data = array();
$keys = implode('|', array_keys($needed_parts));
print $txt;
$txt - Below code shows the values which I received from the server - To check whats coming from the server
username="guest",realm="Restricted area",nonce="4b20d54ab440a",uri="/http.php",cnonce="e6fd095f85a80f1e68f3c2685119b35c",nc=00000001,response="ebaa40b07e3da56e89b048a9766fd4db",qop="auth",opaque="cdce8a5c95a1427d74df7acbf41c9ce0"Your are logged in as: guest

preg_match_all('@(' . $keys . ')=(?🙁[\' "])([^{\2]+?)\2|([^\s,]+))@',} $txt, $matches, PREG_SET_ORDER);

Preg_match_all is used to capture strings which have been matched by the above pattern.

correct

@ - means don't report any error on this.

No, for it to be the error supression operator, it can't be inside a string

@preg_match_all();  // now it's the error supression operator

// just like this has no particular meaning to php
$str = "function foo() {}";

// while this do
function foo() {}

A regular expression starts with a pattern delimiter, then comes the pattern, then the pattern delimiter, and then pattern modifiers. You are free to choose one of several different pattern delimiters.

$pattern = '@matchthis@i';	
$pattern = '#matchthis#i';	
$pattern = '/matchthis/i';

The above are identical. However, the choice of delimiter can be strategical

$pattern = '#http://example.com#';
$pattern = '/http:\/\/example.com/';

$pattern = '#\#23#';
$pattern = '/#24/';

Readability differs a lot.

' .$keys. ' have been imploded with a (pipe '|' also called or ) and because of implode function usage the keys values are 'nonce|nc|cnonce|qop|username|uri|username' .

Correct. As a regexp pattern, this means look for either of these parts. It will match no matter which of them is present (as long as the rest of the pattern also matches).

Than parentheses comes with ?: non capturing parentheses means that it can be captured but it cannot be counted - at the time of using backrefrence.

Not quite. It will have to be part of the total string captured, that is, the matches found in $matches[0][0], $matches[1][0], $matches[2][0] etc

However, each capturing subpattern is captured separately and put in $matches[1] (also an array), $matches[2] (also an array) etc, where the first capturing subpattern is put into $matches[1], the second in $matches[2] etc. And these are what you can also refer to with back references.
EDIT: I'm in a hurry and no longer have time to edit my post to be correct when using PREG_SET_ORDER. This would normally have been correct. Check your own resulting array to see where thing goes...

Now 2 parentheses comes along ([\' "]) with a character class and \' (not very sure about this) -Please confirm if correct - but i think it's escaped so that
we can capture ' or " -as you can see from the above $txt variable username has a value "guest".

When I said PHP doesn't care about what's inside a string, that's an oversimplification. It has to check for the string delimiter among other things. If the string starts with ', it ends with '. If it starts with ", it ends with ".

$str = "string containing \" is fine when escaped";
$str = 'string containing \' is fine when escaped';
$str = 'string containing " does not need to be escaped";
$str = "string containing ' does not need to be escaped";

The upside with using single quote strings for reg exp patters is the way \ is interpreted, and because the \ character may be needed to escape things (for the pattern, which is separate from the string).

$str = "\\s"
echo $str;
// output: \s
$str = '\s'
echo $str;
// output: \s

So \' is just to make php NOT end the string.

Now we get to the third parentheses which is ([^\2])+ - which I think is using negation with a back refrence , so we go back to the 2 parentheses because of \2 and look for another " not start with what was matched in 2 parentheses with +(one or more) and than another parentheses with )?(option sign in the end to tell if its not really needed but optional) and another backrefrence

Now I am confused at this very much | (represented as "pipe" or "or") I think its to do something with the keys which were imploded earlier - look at the $key variable after imploding - Whats the use of this don't know and in what context it's being used don't know. And the last parentheses ([^\s,]+) says dont capture anthing thats whitespace and , with + sign (One or more)[/color]

Sort of. You go back to the second capturing subpattern. Non capturing subpatterns do not count. Since you have a smiley in your pattern, I can't see the exact contents of it, but I'm guessing it's (?🙁 or in plain text, non capturing subpattern directly followed by capturing subpattern. So I would expect the part around there to be similar to this (and semantically identical)

$pattern =#([\'"])[^\1]+\1#'

match either ' or ", then match any other character than the one of ' and " that was just found, and go on until you find such a character, then add that char as well.

<div style="blabla" id='12' name="mydiv">

That pattern would match these strings
"blabla"
'12'
"mydiv"
(including the ' or " characters. I'm not using them to tell you we're talking about strings). Also note that the character class contains three characters, not two: ' " and ordinary SPACE character, which is hard to spot in this font.

The + is called a quantifier (one or more characters, greedy by default, i.e. go on as far as possible while the rest of the pattern still matches). When ? follows a quantifier, it changes the meaning of the quantifier from greedy to ungreedy. I.e. match as few characters as possible while the rest of the pattern still matches.
And, this part of the pattern doesn't just end with the same character that opened it (', " or SPACE), it has | which as discussed before is either one part OR the other part. The next part is anything not being any kind of whitespace, the \s or a , (comma).
And this is the reason for the greedyness modifier. Had the pattern just been to match either ' or " then any amount of any character not being the same as the first one, and then the first char again, there would be no difference between a greedy and ungreedy match. Now, match until the first thing occurs among these possibilities, not the last one.

([^\s,]+) says dont capture anthing thats whitespace and , with + sign (One or more)

Once again, the + is not a character being matched, it's the amount of characters that should be matched (as many as possible, but must be one or more).

foreach ($matches as $m) {
$data[$m[1]] = $m[3] ? $m[3] : $m[4];
unset($needed_parts[$m[1]]);
}
return $needed_parts ? false : $data;[/color]}

Due to the PREG_SET_ORDER, the way things are inserted into $matches differs from default.
Each time the whole pattern matches, an array is added to $matches with all data for this particular match.
So $matches[0] contains everything from the first match, $matches[1] from the second and so on.
$matches[0][0] contains the string matching the WHOLE pattern the first time, $matches[1][0] does the same for the second match and so on.

$matches[0][1] contains the string matching the first capturing subpattern the first time the whole pattern matched, $matches[1][1] the same thing for the second complete match and so on.

So $m[1] contains the word matched from $keys: nonce, nc etc
so the array $data gets something added to $data[nonce], $data[nc] etc.
The thing added comes from the 3rd capturing subpattern if it contains anything, else from the 4th capturing subpattern.
$needed_parts[nonce] will be removed in case nonce matched etc. Then you can use $needed_parts to display what parts where missing.

?>
/////////////////////////////////////////////////////////////////////////////////////////////[/QUOTE]

dudelondon

Thanks johanafm You have given a very clear description of the above code. You explained it in detail and have to say very organised ....Thanks for that. I will make sure that next time my question is bit more like yours.

🙂 salute