Trying to access difficult website with CURL or simple_html_dom
Results 1 to 7 of 7

Thread: Trying to access difficult website with CURL or simple_html_dom

  1. #1
    Junior Member
    Join Date
    Aug 2014
    Posts
    4

    Trying to access difficult website with CURL or simple_html_dom

    I have twice posted a job on Freelancer for someone to write a script that will do a simple query on a website but in spite of several attempts nobody has succeeded. Not sure why it is so difficult. I just want to pass some search values to the script from my PHP code and get a response.

    The website is https://familyhistory dot bdm dot nsw dot gov dot au/lifelink/familyhistory/search?0 (replace dots)

    I need to be able to search births, deaths or marriages (dropdown). I will be supplying first name and surname (eg. John Smith), a date range plus the Regn no. I simply want a response 0 (not found) or 1 (found).

    (To test it search deaths for John Smith for any year from 1800 - 1983. You have to change the date dropdown to Yes and enter a date range. eg. 01 01 1920 to 31 12 1920)

    Is anyone able to explain why the site is so difficult to access please? I believe it uses $_post, session variables and cookies. I have tried disabling cookies in my brower and the search still works so cookies can presumably be ignored.

  2. #2
    Settled 4 red convertible dalecosp's Avatar
    Join Date
    Jul 2002
    Location
    Accelerating Windows at 9.81 m/s....
    Posts
    7,715
    Quote Originally Posted by pm1306 View Post
    I have twice posted a job on Freelancer for someone to write a script that will do a simple query on a website but in spite of several attempts nobody has succeeded. Not sure why it is so difficult. I just want to pass some search values to the script from my PHP code and get a response.

    The website is https://familyhistory dot bdm dot nsw dot gov dot au/lifelink/familyhistory/search?0 (replace dots)

    I need to be able to search births, deaths or marriages (dropdown). I will be supplying first name and surname (eg. John Smith), a date range plus the Regn no. I simply want a response 0 (not found) or 1 (found).

    (To test it search deaths for John Smith for any year from 1800 - 1983. You have to change the date dropdown to Yes and enter a date range. eg. 01 01 1920 to 31 12 1920)

    Is anyone able to explain why the site is so difficult to access please? I believe it uses $_post, session variables and cookies. I have tried disabling cookies in my brower and the search still works so cookies can presumably be ignored.
    As an experiment, try disabling Javascript ;-)

    I wouldn't say it's impossible, but having done this sort of thing before @work, this isn't necessarily a walk in the park. Not to mention, it may be against the site's TOS (I'm not a lawyer and haven't even looked to see if they *have* a TOS, but that's a consideration with some sites).

    Does the organization have an API available?
    /!!\ mysql_ is deprecated --- don't use it! Tell your hosting company you will switch if they don't upgrade! /!!!\ ereg() is deprecated --- don't use it!

    dalecosp "God doesn't play dice." --- Einstein "Perl is hardly a paragon of beautiful syntax." --- Weedpacket

    Getting Help at All --- Collected Solutions to Common Problems --- Debugging 101 --- Unanswered Posts --- OMBE: Office Machines, Business Equipment

  3. #3
    Junior Member
    Join Date
    Aug 2014
    Posts
    4
    Disabling javascript prevents the website from working. The initial dropdown does not work.

  4. #4
    PHP Witch laserlight's Avatar
    Join Date
    Apr 2003
    Location
    Singapore
    Posts
    13,564
    The right approach is to approach the site owners (in this case it seems to be your government: a department of the civil service) to request access to an API that will enable you to perform the search properly and efficiently. If no such API exists, perhaps one can be agreed upon and implemented.

    Quote Originally Posted by pm1306
    Disabling javascript prevents the website from working. The initial dropdown does not work.
    And that partly answers the question of "why the site is so difficult to access" via a script: it is not necessarily a simple matter of just mimicking the form and then submitting it since the Javascript on the site must be taken into account. There does not appear to be cross site request forgery protection, but if there was (e.g., "hidden" within the Javascript), then you would also need to access the page and parse for the CSRF token (not necessarily difficult, but definitely inefficient).
    Use Bazaar for your version control system
    Read the PHP Spellbook
    Learn How To Ask Questions The Smart Way

  5. #5
    Junior Member
    Join Date
    Aug 2014
    Posts
    4
    It has taken about 10 years of requests from thousands of users to get them to implement the current "enhanced" version of their website (2 months ago), which is full of bugs and totally un-user unfriendly. A typical government system designed by a committee. (eg. to search one year you now have to click a dropdown then type the year twice, versus just entering the year once in their previous system).

    The chances of getting them to implement an API are less than zero.

  6. #6
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886
    In that case, you'll need to explore all the AJAX calls (if any) and Javascript that run to figure out how the site works. Also, you should know that intentionally violating a website's Terms of Service can be considered fraud and therefore criminal behavior. You should read the terms of service and make sure you are not in violation.
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

  7. #7
    Junior Member
    Join Date
    Aug 2014
    Posts
    4
    There are no terms of service that would prevent an query being submitted by a PHP script as opposed to a user typing in a browser. The data is public domain and is not subject to any copyright.

    If I had the ability to "explore all the Ajax calls and Javascript" I would not have posted the job several times on Freelancer or here. :-)

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •