Hi all,
thanks again for reading my hideously irritating posts, which are usually solved very quickly followed by me shouting "d'oh".

So, at the moment I am assisting someone with a joomla front end modification(because we all known that joomla is awesome, but its front end is hideous) and a part of this is building articles from an RSS feed. Basically a script will grab the file contents and perform a variety of operations on them to populate the table.

However one problem i keep running into is when creating the alias. The alias is basically the text proceeding the id of the article.

I.e.

example.com/news/14-[B]boobs-are-everywhere[/B]/

Now, I am using the title of the article to create the alias which can look anything like this:

On the 5th October we will be holding a gala event, dont forget 5/10/2010

to

Bob and daves extravaganza

A few str_replace will get rid of the obvious problems, for example replacing spaces with - and removing ,.() etc etc.

The problem, no matter what i do someone will find inventive ways of putting bad characters into the title and my patience is wearing thin. So I suppose the question is, instead of having a black list can I have a white list style system using regular expression?

I.e. remove all characters within $str that are not a-z or space?

Any help will be more than welcome.

Neil

    If you really wish to keep only a-z and space

    $string = preg_replace('#[^a-z ]#', '', $string);
    

    But A-Z will then also be removed. Perhaps [:alpha:] or [:alnum:] are worth considering, see character classes

    $string = preg_replace('#[^[:alnum:] ]#', '', $string);
    

      Remember to check the existence of duplication of the ids generated this way.

      this is $ thing
      this is @ thing

      both will result in same id after removing $ and @

        If it is just to build a url, you could simply [man]urlencode/man it and not worry about what characters they enter.

          Wasn't joomla supposed to take care of seo-friendly url's?
          Besides, how do you permit "On the 5th October" or "5/10/2010" in your url?
          I'd go with urlencode, too.

            I am just back to working on this project now and will report back with findings.

            nevvermind: Joomla DOES deal with friendly urls when it creates the articles itself. I am parsing an RSS vimeo feed to populate the database so i have to basically mimic the actions that joomla does, i.e. grabbing the title:

            neilmasters can not code php to save his life on 16/08/2010 @ university

            And making the alias(used for freindly urls) into:

            neilmasters-can-not-code-php-to-save-his-life < truncated
            or
            neilmasters-can-not-code-php-to-save-his-life-on-16082010-university

            Joomla: Amazing database and admin system, but its front end of absolutely shocking. In the last year I have probably done close to 30 joomla based sites and i have just given up entirely on its front end, it is easier and faster to code the front end yourself. But it is only now that I have started to look closer at optimising the methods I have been using, hence the replacement of continous str_replace() by using regular expression(or urlencode as suggested).

              Write a Reply...