Alternative to url encode

LiamBailey · Mar 1, 2008

Hi Guys,

I dont know if this is only for questions, but this forum has been so helpful to me, and when one of the most helpful posters Laser Light suggest urlencode to me for generating urls from article titles, I thought I would share my better alternative, as I hate all those %20's in my urls.

I post my article into a html form,

and then on my action page, (form action="action_page"), after the article is inserted into the DB for seo purposes I assign all thr html for the article page to a variable, with the php post variables inside the correct tags, i.e.

<title>$title</title> etc.

then I write it to a file using fopen() fwrite, and here is how I use the article title divided by hyphens as the file name, and have included eregi's to remove apostrophes, colons and other potential characters not suitable for urls, from my titles.

$splitname = split(" ",$title);
$counttitle = count($splitname);
$f = 0; while ($f < $counttitle) { $worker = substr($splitname[$f],-1); if (!eregi("[a-z0-9]",$worker)) { $splitname[$f] = substr($splitname[$f],0,-1); } $worker2 = substr($splitname[$f],-2,1); if (!eregi("[a-z0-9]",$worker2)) { $splitname[$f] = substr($splitname[$f],0,-2); $splitname[$f] .= $worker; } $filename .= $splitname[$f]; $filename .= "-"; $f++;};
$smalltitle = strtolower($filename);
$fintitle = substr($smalltitle,0,-1);
$test = substr($fintitle,-1);
if ($test == "-") { $fintitle = substr($smalltitle,0,-2); }//Make sure the last charachter isn't a hyphen//
$finfolder = substr($smallfolder,0,-1);
$file = "/files/home3/warpages/";
$file .= $finfolder; // I used the same while loop to put dashes between the section name, also selected on the html form//
$file .= "/";
$file .= $fintitle;
$file .= ".php";
$create = fopen("$file","w");
fwrite($create,"$write");
fclose($create);
chmod("$file",0755);

Might be a lot of code but for me it beats urlencode, to have nice neat hypen seperated urls. And I use the same code to link to the articles when I pull them back out of the database. Like I say lotta lotta code but it floats my boat.

laserlight · Mar 1, 2008

I have moved this thread to Code Critique.

I thought I would share my better alternative, as I hate all those %20's in my urls.

Actually, urlencode() will not give you %20 but + for spaces. rawurlencode() will give you %20 for spaces.

Might be a lot of code but for me it beats urlencode, to have nice neat hypen seperated urls. And I use the same code to link to the articles when I pull them back out of the database. Like I say lotta lotta code but it floats my boat.

Unfortunately, it has a few problems:

The code is poorly formatted. There is no indentation and you have a whole lot of statements on a single line, making it difficult to debug and edit.
There are undefined variables in the code. Of course, it is obvious that one must define $title to some suitable value. However, there are other undefined variables. You should test your code with error_reporting set to E_ALL and display_errors set to On.
It is not clear how is your alternative better. You should provide some example inputs and the expected outputs, and explain the advantages of your solution, and how it might react if say, the title contains hyphens.
It is not obvious how is your solution generally applicable. Sure you open $file, but $file contains a file path that is specific to your code. I suggest that you write a function that takes $title as the argument and returns a filename formatted according to your specifications. Then you can add the file path to this filename and write it. But with this function, readers can more clearly see how to use your solution in their own code.
You use eregi() where ctype_alnum() will do. In fact, preg_match() should be preferred to eregi(), if regex is required.

LiamBailey · Mar 1, 2008

Thank for your input laserlight. I detect from your post that I may have offended you, I didn't mean to. When I said you have been most helpful i was also going to add that you were very knowledgeable but I didn't want to go overboard. I have a deep respect for your knowledge in this field and know I am nowhere close to your level of knowledge. In that vain, having written this script and to have it work in the way I want it to made me really proud, and I thought maybe someone else might find it in someway helpful. I am currently using this and it works fine. Now to try and answer some of your critique.

I can't argue with your indentation critique, I never use indentation. I am bad for it, I have a full time job as a writer, two websites to write content for and a 16 month old baby, I am often in an awful rush.

$title is a post_var from my form.

If the title contained hyphens it wouldn't cause a problem, the word would simply be hyphenated in the url and filename. As I am using the same code to generate the filename and the link it matches up all the time.

As for it being specific, this is used each time I post an article to create a single and different file each time. And I am not opening this up to public posting, so I am debugging as i go, to start with i didn't have the eregis but i discovered it created a problem when I used a title with a colon and a question mark, I added the second eregi when I wanted to use an apostrophe at the second last character of a word, the apostrophe is now removed and the word truncated as in today's becomes todays in the filename and url.

I didn't use preg_match because this is my second site and is on a very basic hosting package, I read that preg_match is only in later versions I went on the safe side. I never heard of ctype_alnum.

laserlight · Mar 1, 2008

I detect from your post that I may have offended you, I didn't mean to.

You did offend me, but only because I was unable to improve your code for the reasons I outlined :p

With well written code, it is easier to spot places where it can be improved. But if I cannot even run your code, I have no hope of testing any improvement I might suggest.

I can't argue with your indentation critique, I never use indentation. I am bad for it, I have a full time job as a writer, two websites to write content for and a 16 month old baby, I am often in an awful rush.

More haste, less speed, as the saying goes. If you write unreadable code because you are in a rush, you will be even more tied for time when you need to debug and maintain the code.

$title is a post_var from my form.

I get the warnings:

Notice: Undefined variable: filename

Notice: Undefined variable: smallfolder

The $filename problem is easy to solve: you are appending to the variable before defining it, so it probably should be initialised to an empty string.

If the title contained hyphens it wouldn't cause a problem, the word would simply be hyphenated in the url and filename. As I am using the same code to generate the filename and the link it matches up all the time.

As I said, you should provide examples. This will help readers understand what your code is expected to do.

As for it being specific, this is used each time I post an article to create a single and different file each time. And I am not opening this up to public posting, so I am debugging as i go, to start with i didn't have the eregis but i discovered it created a problem when I used a title with a colon and a question mark, I added the second eregi when I wanted to use an apostrophe at the second last character of a word, the apostrophe is now removed and the word truncated as in today's becomes todays in the filename and url.

Of course some parts of your code will be specific to what you are trying to do. But you said that your solution is an alternative to url encoding, so clearly some parts of your solution must be generic. I am suggesting that you place this generic portion of your code into a function.

I didn't use preg_match because this is my second site and is on a very basic hosting package, I read that preg_match is only in later versions I went on the safe side. I never heard of ctype_alnum.

preg_match and the other PCRE regex functions have been available since PHP 4, and the extension has been enabled by default since PHP 4.2.0. Likewise, ctype_alnum (as in, check alphanumeric) and other Character Type functions have been enabled by default since PHP 4.2.0.

LiamBailey · Mar 1, 2008

Thanks again laserlight, I will try and be more thoughtful before posting any code for critique in future.

I apologise, I had asked a question in this postbut I will use the correct place in the forum to do so.

laserlight · Mar 1, 2008

I will try and be more thoughtful before posting any code for critique in future.

Sure, but remember, if you are looking for feedback, you might get some. So just incorporate the feedback appropriately.

About preg_match, why is it better than eregi?

If you take a look at the PHP manual's entry for [man]ereg/man, you will find:

Note: preg_match(), which uses a Perl-compatible regular expression syntax, is often a faster alternative to ereg().

Aside from that, the POSIX extended regex functions are not binary safe, unlike the PCRE functions.

And I am thinking of writing a search script for my DB, and the problem I had for a while trying to escape characters using

str_replace("[<p/>]","[<p/>]",$string),

is how to get it to search for an entire string instead of any charachter in the string.

This you should ask in the PHP Help forums. I am not sure what you mean though, so perhaps when you post for help you might want to clarify.

I am looking forward to your revised version of your alternative to url encode

Weedpacket · Mar 2, 2008

laserlight wrote:
Aside from that, the POSIX extended regex functions are not binary safe, unlike the PCRE functions.

Furthermore, if I'm not out of date on PHP 6 development, ereg and its related functions (eregi, ereg_replace, eregi_replace, and split), are being moved out of the base PHP distribution and into PECL as a separately-distributed extension.

Also the PCRE library is still being actively developed both in terms of performance and syntax enhancements.

Other points:

$f = 0;
while($f < $counttitle)
{
 // ....
  $f++;
}

Could be simplified to

for($f=0; $f<$counttitle; $f++)
{
  //...
}

"$foo"

is almost always equivalent to

$foo

but without the extra steps of creating a new string and interpolating the variable's value into it.

And it doesn't seem to do the right thing with

$title = "This is a title that is going to <be> !*/encoded/*!";

LiamBailey · Mar 2, 2008

I'm not sure what you mean with the variable in quotes being the same as a variable without quotes.

But for your example

$title = "This is a title that is going to <be> this-is-a-title-that-is-going-to-be.php

I originally had for loops but my script was timming out on test runs so i changed it, then realised it was another part of the script that was at fault but as it was working i didn't change it back

Weedpacket · Mar 2, 2008

LiamBailey wrote:
I'm not sure what you mean with the variable in quotes being the same as a variable without quotes.

$foo = "/this/is/something";
$bar = "$foo";

What is the difference between $foo and $bar?

but as it was working i didn't change it back

Really? I got for $filename

/files/home3/warpages//this-is-a-title-that-is-going-to-<be-!*/encoded!.php

Is that intended?

LiamBailey · Mar 2, 2008

Sorry, I didn't realise <be> and !/encoded/!";

were actually part of your title,

Can you tell me why anyone would use <> around a part of their title?

Or have a string like your encoded in a title?

I think yu are misunderstanding the purpose of my script. It does not perform the exact same purpose as urlencode(), but for my purpose it works, i am not intending to make this public, so i know my titles will be valid, this is to reduce my time working on the site, I just write the articles, enter them into my cms html form and my script does everything else for me. If people wanted to use it for public purposes then perhaps they could validate the title with preg_match somewhere at the beggining of the script.

laserlight · Mar 2, 2008

It does not perform the exact same purpose as urlencode(), but for my purpose it works, i am not intending to make this public, so i know my titles will be valid, this is to reduce my time working on the site, I just write the articles, enter them into my cms html form and my script does everything else for me.

You know, if all this is so very specific to your needs and you refuse to divulge what are the exact pre-conditions and post-conditions of your script, or to respond to suggestions for improvement, then why did you even bother to "share" it at all?

If people wanted to use it for public purposes then perhaps they could validate the title with preg_match somewhere at the beggining of the script.

You have not said what is a valid title. Besides, relying on user input to be correct is a bad practice. Maybe, for now, you are both the user and developer, so you know what kind of input is valid. But later on, things change, and because you made no attempt to future proof your code, you will run into problems.

Alternative to url encode

LLiamBailey

laserlight

LLiamBailey

laserlight

LLiamBailey

laserlight

Weedpacket

LLiamBailey

Weedpacket

LLiamBailey

laserlight