hello dear php-experts

today i have not (!) a php-question - but a perl-one.

what is aimed: i want to do a search to find out all urls that contains the following term: /participants-database/
but unfortunatley this does not work :


#!C:\Perl\bin\perl 

use strict; # You always want to include both strict and warnings 
use warnings; 


use LWP::Simple; 
use LWP::UserAgent; 
use HTTP::Request; 
use HTTP::Response; 
use HTML::LinkExtor; 

# There was no reason for this to be in a BEGIN block (and there 
# are a few good reasons for it not to be) 
open my $file1,"+>>", ("links.txt"); 
select($file1);   

#The Url I want it to start at; 
# Note that I've made this an array, @urls, rather than a scalar, $URL 
#my @urls = (' $url =~ s$||;'); 
my $urls =~ ('s|/participants-database$||'); 
my %visited;  # The % sigil indicates it's a hash 
my $browser = LWP::UserAgent->new(); 
$browser->timeout(5); 

while (@urls) { 
  my $url = shift @urls; 
  # Skip this URL and go on to the next one if we've 
  # seen it before 
  next if $visited{$url}; 

  my $request = HTTP::Request->new(GET => $url); 
  my $response = $browser->request($request); 

  # No real need to invoke printf if we're not doing 
  # any formatting 
  if ($response->is_error()) {print $response->status_line, "\n";} 
  my $contents = $response->content(); 

  # Now that we've got the url's content, mark it as 
  # visited 
  $visited{$url} = 1; 

  my ($page_parser) = HTML::LinkExtor->new(undef, $url); 
  $page_parser->parse($contents)->eof; 
  my @links = $page_parser->links; 

  foreach my $link (@links) { 
	print "$$link[2]\n"; 
	push @urls, $$link[2]; 
  } 
  sleep 60; 
}



any idea!

my $url =~s|/bar$||;

well i tried to leave out the "my",
that was a mistake

The "my" causes a new $url to be created.

What we want is to modify the old $url.

but - unfortunatly this does not work

    With the caveat that I haven't had to do anything with Perl in well over a decade...

    You say you want to match, but your code is doing replacement. You say you want to "match" strings in an array containing [font=monospace]/participants-database/[/font], but you're using a regular expression that says matches end with [font=monospace]/participants-database[/font] (count the '/' characters).

    Since the code doesn't work, and I don't know what the random business with "s|/bar||" is supposed to be about, I'll ignore both and write a line that will "do a search to find out all urls that contains the following term: /participants-database/".

    my $participants_databases = grep(/\/participants-database\//, @urls);

      hello dear weedpacket

      many thanks for the quick reply. Great to hear from you. I am very glad that you have a perl background.

      this should be the formal replaement - the generalized expression

      my $url =~s|/bar$||;
      

      in fact: youre right: the term "/bar/" should be ....:replace with /participants-database/

      the goal: i want to find all ressources and places in the internet that contains this term ....

      so i use this line

      my $participants_databases = grep(/\/participants-database\//, @urls);

      [/QUOTE]

      i will try it out later the weekend.

      many many thanks so far

      Great forum here - i really love it

        hello and good day

        i got the following errors..

        .

        martin@linux-jnmx:~/perl> perl wc1.pl                                                                                                                                                           
        Global symbol "@urls" requires explicit package name at wc1.pl line 21.
        Global symbol "@urls" requires explicit package name at wc1.pl line 26.
        Global symbol "@urls" requires explicit package name at wc1.pl line 27.
        Global symbol "@urls" requires explicit package name at wc1.pl line 50.
        Execution of wc1.pl aborted due to compilation errors.

        with the following code;

        #!C:\Perl\bin\perl
        
        use strict; # You always want to include both strict and warnings
        use warnings;
        
        
        use LWP::Simple;
        use LWP::UserAgent;
        use HTTP::Request;
        use HTTP::Response;
        use HTML::LinkExtor;
        
        # There was no reason for this to be in a BEGIN block (and there
        # are a few good reasons for it not to be)
        open my $file1,"+>>", ("links.txt");
        select($file1);  
        
        #The Url I want it to start at;
        # Note that I've made this an array, @urls, rather than a scalar, $URL
        #my @urls = (' $url =~ s$||;');
        my $participants_databases = grep(/\/participants-database\//, @urls);
        my %visited;  # The % sigil indicates it's a hash
        my $browser = LWP::UserAgent->new();
        $browser->timeout(5);
        
        while (@urls) {
          my $url = shift @urls;
          # Skip this URL and go on to the next one if we've
          # seen it before
          next if $visited{$url};
        
          my $request = HTTP::Request->new(GET => $url);
          my $response = $browser->request($request);
        
          # No real need to invoke printf if we're not doing
          # any formatting
          if ($response->is_error()) {print $response->status_line, "\n";}
          my $contents = $response->content();
        
          # Now that we've got the url's content, mark it as
          # visited
          $visited{$url} = 1;
        
          my ($page_parser) = HTML::LinkExtor->new(undef, $url);
          $page_parser->parse($contents)->eof;
          my @links = $page_parser->links;
        
          foreach my $link (@links) {
        	print "$$link[2]\n";
        	push @urls, $$link[2];
          }
          sleep 60;
        }
        

        guess that i have to think the code over!?

          while (@urls) {

          You haven't defined [font=monospace]@urls[/font] anywhere.

            hello dear Weedpacket

            many many thanks - will have a closer look at this later the weekend.

            many thanks for the continued help i try to fix this so that i get the spript up and running...

            greeetings

              hello all

              Ive messed up the code a bit:

              i have made @urls the result of a grep on @urls, which was not yet defined at all.

              i need to define the following:

              my @in_urls = ('bob', 'joe' );

              btw: also and furthermore - I'm not quite sure about the while loop. do we need the while loop?

              foreach my $url ( @urls ){

              do stuff with the URL

              }

              i need to get it to work

                Write a Reply...