Remove non-standard characters from string
Results 1 to 12 of 12

Thread: Remove non-standard characters from string

  1. #1
    Junior Member
    Join Date
    Dec 2007
    Posts
    4

    Remove non-standard characters from string

    Hi

    I have two questions:

    1. How do you remove non-special characters from a string?
    2. How do you remove all instances of html from a string?

    For example let's say you have the following string:

    $str = "Check this out <a href=�http://www.somewebsite.com�>Somewebsite</a>, this is a great website";

    How do you remove characters such as "�" from this string, as well as the html code?

    Thank you

  2. #2
    Senior Member wilku's Avatar
    Join Date
    Oct 2003
    Location
    Warsaw, Poland
    Posts
    723
    1. Allowing only alfanumeric data:
    PHP Code:
    $output preg_replace("/[^A-Za-z0-9]/","",$input); 
    2. Stripping html and php tags: strip_tags
    Remove tags first and then any remaining non-alphanumeric characters.

    EDIT:
    PHP Code:
    //this may be better and includes punctuation as well
    $output preg_replace("/[^[:alnum:][:punct:]]/","",$input); 
    Last edited by wilku; 12-23-2007 at 06:19 AM.
    Wilku <><

  3. #3
    High Energy Magic Dept. NogDog's Avatar
    Join Date
    Aug 2006
    Location
    Ankh-Morpork
    Posts
    13,949
    Depending on exactly what you want to do, you might also want to look into htmlentities() and htmlspecialchars().
    Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be." ~ from Nation, by Terry Pratchett

    "But the main reason that any programmer learning any new language thinks the new language is SO much better than the old one is because hes a better programmer now!" ~ http://www.oreillynet.com/ruby/blog/...ck_to_p_1.html


    eBookworm.us

  4. #4
    Junior Member
    Join Date
    Dec 2007
    Posts
    4
    Thanks for your reply wilku. The output seems to now include ascii numerals in the places of the characters taken out such as:

    "60 a href http www somewebsite com 62 somewebsite 60 a 62"

    Is there a way to remove these?

    Furthermore, how do prevent it from removing full-stops?

    Thanks again.

  5. #5
    High Energy Magic Dept. NogDog's Avatar
    Join Date
    Aug 2006
    Location
    Ankh-Morpork
    Posts
    13,949
    I would just do:
    PHP Code:
    echo htmlentities(strip_tags($str), ENT_QUOTES); 
    Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be." ~ from Nation, by Terry Pratchett

    "But the main reason that any programmer learning any new language thinks the new language is SO much better than the old one is because hes a better programmer now!" ~ http://www.oreillynet.com/ruby/blog/...ck_to_p_1.html


    eBookworm.us

  6. #6
    Junior Member
    Join Date
    Dec 2007
    Posts
    4
    Quote Originally Posted by NogDog
    I would just do:
    PHP Code:
    echo htmlentities(strip_tags($str), ENT_QUOTES); 

    Thanks for this. However, now the output resembles this:

    " & # 60;a href=�http://www.somewebsite.com�& # 62;Somewebsite&# 60;/a &# 62"

    (I've added spaces otherwise the characters here get modified by this forum script)

    What I would like is for it to take out all these types of characters (such as "&#60" as well as HTML references) and keep the output clean. Any suggestions?
    Last edited by drpol; 12-23-2007 at 10:38 AM.

  7. #7
    PHP Witch laserlight's Avatar
    Join Date
    Apr 2003
    Location
    Singapore
    Posts
    13,564
    What I would like is for it to take out all these types of characters (such as "&#60" as well as HTML references) and keep the output clean. Any suggestions?
    What characters do you want to allow? Remnove everything else.

    Actually, why do you want to do this? Removing characters can be harmful to the data.
    Use Bazaar for your version control system
    Read the PHP Spellbook
    Learn How To Ask Questions The Smart Way

  8. #8
    Junior Member
    Join Date
    Dec 2007
    Posts
    4
    Quote Originally Posted by laserlight
    What characters do you want to allow? Remnove everything else.
    I would like to remove non-standard characters such as "" & # 60;a" and "�"

    Actually, why do you want to do this? Removing characters can be harmful to the data.
    I am removing these characters only to generate an validated XML output (I do not manipulate the data in the database in which the data is stored).

  9. #9
    PHP Witch laserlight's Avatar
    Join Date
    Apr 2003
    Location
    Singapore
    Posts
    13,564
    I would like to remove non-standard characters such as "" & # 60;a" and "?"
    You did not answer my question
    I asked you what you wanted to keep, not want you wanted to remove.

    I am removing these characters only to generate an validated XML output (I do not manipulate the data in the database in which the data is stored).
    It sounds like you do not actually want to remove these characters. Functions like htmlspecialchars() and htmlentities() should be what you want since they substitute the special characters with their escape sequences.
    Use Bazaar for your version control system
    Read the PHP Spellbook
    Learn How To Ask Questions The Smart Way

  10. #10
    High Energy Magic Dept. NogDog's Avatar
    Join Date
    Aug 2006
    Location
    Ankh-Morpork
    Posts
    13,949
    Perhaps your problems could be addressed via the use of CDATA tags in your XML?
    PHP Code:
    $str '<![CDATA[' strip_tags($str) . ']]>'
    Also, make sure that the encoding attribute of your <?xml?> tag matches the character encoding of the source of your text. For example, if the text is coming from an input form on a web page, make sure that if that web page/form uses UTF-8 encoding then that your resultant XML page begins with:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be." ~ from Nation, by Terry Pratchett

    "But the main reason that any programmer learning any new language thinks the new language is SO much better than the old one is because hes a better programmer now!" ~ http://www.oreillynet.com/ruby/blog/...ck_to_p_1.html


    eBookworm.us

  11. #11
    Junior Member
    Join Date
    Apr 2008
    Posts
    1
    <?php echo htmlspecialchars($string); ?> produces valid xml output

  12. #12
    PHP Witch laserlight's Avatar
    Join Date
    Apr 2003
    Location
    Singapore
    Posts
    13,564
    Sorry to say, B1sh0p, but you were beaten to that suggestion by yours truly more than three months ago. Kindly do not resurrect old threads without good reason.
    Use Bazaar for your version control system
    Read the PHP Spellbook
    Learn How To Ask Questions The Smart Way

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •