Hi guys.

I have some random questions relative to the topic of 'files' and 'directories'.

I have just finished reading a chapter on the topic of 'files' and 'directories', from a php book I am in the process of reading.

There were a few things during the reading of the chapter that were slightly unclear to me and I have a number of questions (below) that I need some help understanding:

1) In php, what does 'end-of-line characters' mean? Here is an extract from the book I am reading (and I don't understand their meaning behind 'end-of-line characters'):

As with other flags in PHP you can combine any of these flags with the bitwise OR operator (see Chapter 3
for details). For example, the following code looks for a file in the include path and, when found, reads
the file, ignoring any empty lines in the file:
$lines = file( “myfile.txt”, FILE_USE_INCLUDE_PATH | FILE_SKIP_EMPTY_LINES );
As with fopen() , you can also use file() to fetch files on a remote host:
$lines = file( “http://www.example.com/index.html” );
foreach ( $lines as $line ) echo $line . “ < br / > ”;
A related function is file_get_contents() . This does a similar job to file() , but it returns the
file contents as a single string, rather than an array of lines. The end - of - line characters are included in
the string:
$fileContents = file_get_contents( “myfile.txt” );

... I don't get it! Surely the end of lines characters are the last few characters in the string?

2) In reference to folders, what is the difference between:

./

and

/

............ because they both seem to do the same thing from what I can see. Is there a difference?

3) Below is an extract from my book, I find very confusing:

You can also use the fileperms() function to return an integer representing the permissions that are set
on a file or directory. For example, to print the octal value of the permissions on a file you might use:
chmod( “myfile.txt”, 0644 );
echo substr( sprintf( “%o”, fileperms( “myfile.txt”) ), -4 ); // Displays
“0644”
(The call to substr() is used to return just the last four digits, because the other octal digits in the
returned value aren ’ t relevant.)

The thing I do not understand is why (above) they say that substr() is used to return the last four digits, because the other octal digits in the returned value aren’t relevant.

Octal numbers are only 3 digits long aren't they? So how does substr (above) manage to extract 4 digits (which should be impossible, since the conversion specification converted the argument to an octal number)?

4) Below is another extract from the book:

File and directory modes only work on UNIX systems such as Linux and Mac OS; they have no effect
when used on Windows machines.

... What I want to know is: can I type up permissions on my windows operating system and upload them to a linux server, and will they still work?

5) Below is a snippet of code from the book I am reading:

$filename = preg_replace( “/[A-Za-z0-9_- ]/”, “”, $filename );

In the book it says that the regular expression (above) strips all the characters from the filename except letters,
digits, underscores, hyphens, and spaces.

What I need to know is: what part of:

/[A-Za-z0-9_- ]/

..... communicates "except for"?

and why does the dash need to be escaped?

Paul.

    Paul help!;11051437 wrote:

    Hi guys.
    ... I don't get it! Surely the end of lines characters are the last few characters in the string?

    EOL characters are the ones that get entered when you hit 'enter' or return' to start a new line. They are what separates this line...

    ...from this line. They are the New Line (\n) and Carriage Return (\r) characters. They are the 10th and 13th chars on the ASCII chart. In PHP, you can get them into a string in a number of ways:

    // specify a scalar string value using the backslash to escape
    $newline = "\n";
    $carriage_return = "\r";
    
    // using the chr function
    $newline = chr(10);
    $carriage_return = chr(13);
    
    Paul help!;11051437 wrote:

    2) In reference to folders, what is the difference between:
    ./
    and
    /

    Yes they are totally different. Note that these are linux file path conventions. Windows file paths are quite different. The period followed by a slash means the current working directory. Whenever you run a script or open a terminal window on a linux machine, there's this idea that you have a current working directory. You can change that working directory using the cd command.

    The slash all by itself refers to the absolute root of your file system. This means the lowest, most basic level of your file system. The folder that contains all other folders. If you start a path with just the slash (no period first) then you are specifying an absolute path from the root of your file system. This is helpful if you need to specify some file in a totally different location. This approach has its advantages and disadvantages that you'll just have to learn over time.

    cd /home/sneakyimp # changes working directory to my home folder
    ls ./ # lists contents of current working dir, which is /home/sneakyimp
    
    ls / #lists the contents of the file system root
    cd /var/www #changes my working directory to the /var/www folder
    
    ls . # lists the current working directory
    cd . # this doesn't really do anything
    
    ls .. # two dots refers to the parent of the current working directory. I.e., "up"
    cd .. # this goes "up" a directory. E.g., from /var/www to /var
    

    Changing your current working directory is helpful because it means you don't have to type the really long absolute paths to the files you are working with. You can refer to files in the cwd without typing their whole path. You'll note that the real difference is whether the path starts with a slash or a period. A single period (.) refers to the cwd. Two periods (..) refers to the parent of the cwd.

    Paul help!;11051437 wrote:

    The thing I do not understand is why (above) they say that substr() is used to return the last four digits, because the other octal digits in the returned value aren’t relevant.

    Octal numbers are only 3 digits long aren't they? So how does substr (above) manage to extract 4 digits (which should be impossible, since the conversion specification converted the argument to an octal number)?

    Just like decimal or binary numbers, octal numbers can be arbitrarily long. You can't express 1,000,000,000 with just 3 octal digits. I would encourage you to run the command without the substr bit to see what you get:

    echo sprintf( "%o", fileperms("/home/sneakyimp/my_file.txt"); // I get 100644
    

    I'm not sure what the starting 10 are for myself. I suspect it has something to do with [man]fileperms[/man] or [man]sprintf[/man]. Try reading the docs and you may find a clue. I do know what the 0644 means just because I'm familiar with *nix-style file permissions. The 6 means that the owner of the file can read and write it. The next 4 means that the group assigned to the file can read it. The final 4 means that any user on the machine can read the file.

    Paul help!;11051437 wrote:

    File and directory modes only work on UNIX systems such as Linux and Mac OS; they have no effect
    when used on Windows machines.

    ... What I want to know is: can I type up permissions on my windows operating system and upload them to a linux server, and will they still work?

    Hmm. I almost never work with PHP on windows so I'm no authority but I would say NO that you have to deal with windows and linux file permissions separately because they are pretty different. Certain file-related functions like [man]file_exists[/man] or [man]file_get_contents[/man] and so on should work on both systems but windows paths (e.g., C:\windows\httpd) are very different than linux file paths (/var/www). Writing code that works on both systems is entirely feasible, but takes a bit more work. I'm not exactly sure what you mean when you say "can I type up permissions."

    Paul help!;11051437 wrote:

    What I need to know is: what part of:
    /[A-Za-z0-9_- ]/
    ..... communicates "except for"?

    and why does the dash need to be escaped?

    PCRE functions like [man]preg_replace[/man] are tremendously powerful and I love them. The price for that power is that you have to learn the very peculiar language and syntax of Regular Expressions. These are a bit of a mind-bender, but if you start learning them, it can be quite rewarding. It's sort of like the Kung Fu which doesn't help at first, but once you master it after years of getting your ass kicked, you become the new sensei.

    The meaning of characters in a regular expression depends on the context where they appear. In your regular expression, the square brackets [] indicate "here is a group of characters" and within that context the ^ char says "exclude the following characters within this square bracket section". The dash must be hyphenated when you put it within a square bracket section because the dash has a specific meaning in that particular context. You see the other dashes A-Z and a-z and 0-9 so you might realize that a dash in square brackets indicates that a range is being exressed. The dash in A-Z means "all chars between A and Z." If you want to express an actual dash char and not a range, you have to "escape" the dash with a backslash. This is all based on some elaborate sequence of syntax rules and stuff and just reading about will probably bore you to death. Try experimenting.

    In effect, your preg_replace command says replace every character except for those described within your angle brackets with the empty string.

      And, before the question gets asked, just as "/" is the root of the system, and "./" is the current directory ... "../" is the current directory's parent directory. 😉

        sneakyimp wrote:

        I'm not sure what the starting 10 are for myself

        "Regular file". If it had been a directory, say, you'd have got '40' and '12' from a symlink (what Windows calls a "soft link").

          sneakyimp;11051457 wrote:

          In effect, your preg_replace command says replace every character except for those described within your angle brackets with the empty string.

          I mean to say "within your square brackets".

          To clarify:

          /[^A-Za-z0-9_\- ]/
          

          The slashes at beginning and end are there for some strange historical reason, like a delimiter of some kind. You can also use other chars like @ and # but the one at the beginning has to match the one at the end. You can also put modifiers after the one at the end which will affect what gets matched by the pattern.
          The first square bracket ([) says "the next character must match one in this list"
          The ^ char says "surprise! i actually want it NOT[/]b to match any of the following chars in this list"
          A-Z says all chars between A and Z are in my char list
          a-z says all chars between lowercase a and z are in my char list
          0-9 says digits are in my char list
          the underscore adds the underscore char to the list
          the dash preceded by a backslash adds the dash char to the list
          there is a space in there too which adds the space char to the list
          the closing square bracket (]) closes the list of chars

          So preg_match will take any section of your string that DOES NOT match that list (remember we had the ^ char in there) and replaces it with the empty string.

            25 days later

            Thanks guys.

            I just need to ask something else in regards to the answer 'sneaky imp' gave for my question 'number 3'.

            What I need to know is: why must the conversion specification be octal?

            Is it because - if I used a decimal as the conversion specification it would not be capable of being arbitrarily long?
            If the conversion specification was a decimal, would it cut off the starting '0'?

            Paul.

              Paul help! wrote:

              What I need to know is: why must the conversion specification be octal?

              Is it because - if I used a decimal as the conversion specification it would not be capable of being arbitrarily long?
              If the conversion specification was a decimal, would it cut off the starting '0'?

              It need not be octal, but the thing is that because Unix file permissions involve setting bits for user, group, and other, octal is sensible, i.e., at a glance you can determine who has what permission bits set. Hexadecimal would be pointless because only three bits (read, write, execute) are involved per user/group/other.

                To paraphrase what laserlight said, the use of octal makes it easier to specify 3 bits (23=8, so we use base-8 or "octal"). If you tried to use decimal numbers I don't even know if it would, but I can tell you it would be tricky to try and convert decimal (i.e., base-10) numbers to specify which bits are on and which are off.

                I.e., it's a lot easier to convert base-8 numbers to base-2 numbers than it is to convert base-10 numbers to base-2 numbers.

                  by 3 bits,, do you mean the 3 digits:

                  644

                  in the permissions:

                  0644

                  ?

                    No, being octal, each digit represents three bits.

                      ok.

                      the only thing I do not understand is sneakyimp's post:

                      To paraphrase what laserlight said, the use of octal makes it easier to specify 3 bits (23=8, so we use base-8 or "octal"). If you tried to use decimal numbers I don't even know if it would, but I can tell you it would be tricky to try and convert decimal (i.e., base-10) numbers to specify which bits are on and which are off.

                      I.e., it's a lot easier to convert base-8 numbers to base-2 numbers than it is to convert base-10 numbers to base-2 numbers.

                      I need to ask, what is meant by:

                      23=8

                      And I don't understand why he brings up the how it is tricky to try and convert decimal (i.e., base-10) numbers to specify which bits are on and which are off.
                      When does this happen? does the server automatically convert 'base 8' to 'base 2'?

                        Two to the power of three equals eight. If he'd used PHP 5.6 syntax: [font=monospace]2**3==8[/font].

                        Paul help! wrote:

                        When does this happen?

                        In the programmer's brain. It's easier to look at "0666" and recognise it as meaning "I have read and write access, my group has read and write access, and the world has read and write access" than to look at the decimal number 438 and recognise it as meaning the same thing. In the former case there is a direct one-one correlation between digits and permissions, in the latter there is no such correlation.

                          but what I am asking is, does the server automatically convert 'base 8' to 'base 2'?

                            Paul help! wrote:

                            does the server automatically convert 'base 8' to 'base 2'?

                            0644 is an integer. The leading 0 is a prefix that indicates that this integer is represented in base eight. Therefore, in decimal, this integer would be represented as 420. But in both cases, the number is the same: only the representation of it has changed. Now, current computer hardware technology uses bits to represent numbers. Therefore, whether you represent the number as 0644 or 420 in your code, at some point the machine would see it in its binary representation, since the PHP interpreter would invoke system functions and pass along the number to them.

                              Thanks. Things seem clearer now.

                              So permissions are generally specified using the octal number system?

                                Paul help! wrote:

                                So permissions are generally specified using the octal number system?

                                Traditional Unix file permissions. There are of course other permission systems that might not involve setting bits corresponding to permissions for different sets of users.

                                  thanks.

                                  so when you says:

                                  0644 is an integer. The leading 0 is a prefix that indicates that this integer is represented in base eight.

                                  the only thing I do not understand is why in the following code:

                                  chmod( &#8220;myfile.txt&#8221;, 0644 );
                                  echo substr( sprintf( &#8220;%o&#8221;, fileperms( &#8220;myfile.txt&#8221😉 ), -4 ); // Displays &#8220;0644&#8221;

                                  ... does the author think that its a good idea to change 0644 to an octal number when it already appears to be octal in the first place?

                                    Paul help! wrote:

                                    ... does the author think that its a good idea to change 0644 to an octal number when it already appears to be octal in the first place?

                                    If you read the PHP manual entry on [man]fileperms[/man], you will see that it returns an int. So, if you try to print it as per normal, you will print it in the usual decimal representation, e.g., 420 instead of 0644 (actually, it might not be 420 as additional permission related bits may be involved and returned along with the basic permission bits that were used for chmod).

                                    Hence, the idea is to format it into a numeric string in octal representation, then only take the last four characters, discarding any additional data that is not needed here.

                                      so when you told me that the leading zero in: 0644, in the following line:

                                      chmod( &#8220;myfile.txt&#8221;, 0644 );

                                      .. indicates that this integer is represented in base eight.

                                      Is this leading zero (0) the reason why the following code:

                                      chmod( &#8220;myfile.txt&#8221;, 0644 );
                                      echo substr( sprintf( &#8220;%o&#8221;, fileperms( &#8220;myfile.txt&#8221😉 ), -4 );

                                      outputs:

                                      &#8220;0644&#8221;

                                      ... and not a completely different set of numbers?

                                        Um, I don't know what to make of that question. The answer must be "yes" since 0644 != 644. Unless chmod's second argument is pointless, there must be some kind of difference.

                                        Perhaps you should run this PHP script:

                                        <?php
                                        echo 0644;

                                        You might want to read up on number representations, i.e., to understand the place-value system (and hence binary, octal and hexadecimal), especially to dispel your false notion that "Octal numbers are only 3 digits long". You should also read up on bitwise operations, so as to understand what this 0644 business is all about.

                                          Write a Reply...