Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

PHP Creating a file scanner in PHP

Ghost

Platinum Coder
Today I will teach you how to create a file scanner with PHP. You will be able to search every file for a specific value, such as a string of text. Perhaps you are looking for every file that contains your email address so you can change it... This would work nicely for that.
You can use PHP's scandir() function to find files in a directory. You can also find folders, or even create scripts that recursively explore every single folder and sub-folder on the system. It doesn't matter if each sub folder has more sub folders, with more sub folders, etc. That is the focus of this tutorial. First we will begin by starting off with one folder, and setting up our logic. Then we will take our code and put it into a function so we can recursively search the file system, opening each file and looking for our search text.

Using scandir() to scan a folder
PHP:
<?php
    $find = scandir("path/to/folder");
    $num = count($find);
?>

With just this simple code, we now have an array called $find that holds all the files / folders.
However, in Linux systems we will also have two "folders" called ".." and "." - The double dot (..) represents the directory above the current working directory...
so if it's path/to/folder, the double dot (..) inside this folder represents path/to... one directory up. The single dot (.) represents the current working directory, so path/to/folder/
Because of this, we need to filter out the dot and double dot, so our script does not analyze them.

PHP:
<?php
$folder = "./";
$find = array_diff(scandir($folder), array(".", ".."));
$num = count($find);
print_r($find);
?>

We can do this with array_diff, which basically takes two arrays in our code. The first array is the results of our scandir(), and the second array holds the single and double dots. array_diff() then removes any values in our 2nd array from our 1st array, making our $find value have all files/directories inside without the dots.

For testing purposes, I have set up the following structure. All of my PHP code is contained inside 'start.php':
SwzYrQT.png


When I use print_r($find), I can see this printed out on the screen:
Array ( [2] => another-random-file.html [3] => fakefolder [4] => randomfile.md [5] => start.php )

As you can see, we have everything we need, but we still need to identify what are files and what are folders. If it's a folder, we will search that folder in our recursive function so you can use this code from the root of your file system (ex: public_html) and find all files that contain the search text you are looking for. One thing worth noting is that the array keys are messed up (start at 2, end at 5) because we removed array keys 0 & 1 when we removed the dots. We use array_values() to fix this, returning the index to 0-3, for 4 total items in this case.

PHP:
<?php
$path = "./"; // starting folder
$find = array_values(array_diff(scandir($path), array(".", "..")));
$num = count($find);
for($x = 0; $x < $num; $x++){
    if(is_file($find[$x])){
        echo $find[$x] . " is a file <br>";
    } else if(is_dir($find[$x])){
        echo $find[$x] . " is a directory<br>";
    }
}
?>

As you can see, we have now done a few things.
For one, I changed the $folder variable to be $path, so that it's more representative of what it actually is. We're still using "./" to mean "current working directory", or whatever folder the script is located in. I applied array_values() to our $find array. I then started a basic loop that goes through each result. Although we are about to remove the echos, our script currently tells us whether each result is a file, or a directory. From there we can do more. Let's first set up our logic for if it's a file and do a general search of all the text inside to look for our search text.

is_file() lets us know it is in fact a file, and is_dir() lets us know if it is a directory. You can technically use another else statement to catch any other odd result, which proved useful for me in the past due to coding errors messing up the script. It's up to you!


Search file content lines using fgets() with PHP
[CODE lang="php" highlight="2,6-9"]<?php
$searchText = "test";
$found = array();
for($x = 0; $x < $num; $x++){
if(is_file($find[$x])){
$f = @fopen($path . $find[$x], "r"); // use @ to suppress warnings
if($f){ // found file
while (($bufferContent = fgets($f)) !== false) { // good practice to fetch the contents with a buffer / line by line
if(strpos($bufferContent, $searchText) !== false){ // if the text "test" is found in this line...
// found the search text
echo "found in $bufferContent <br>";
$found[$path][] = $find[$x]; // add to our array of files containing search text
}
}
if (!feof($f)) {
echo "Error: unexpected failure with fgets()!\n";
}
fclose($f);
}
}
}
?>[/CODE]

So we now have quite a few additions. I used the variable $searchText to store our value of "test". This means that any file containing "test" in it (including things like "THIStest" which has no space before it) will be added to our $found array.

We start off by using fopen() to open up the file, with the @ symbol to suppress any warnings if there's an issue. We then check to make sure it opened correctly, and if so we can use fgets() function to fetch each line of the file, one at a time. This is generally a good way to search files instead of loading in the entire contents and searching. It also lets us skip a lot of work... if a file has "test" in it on 100 different lines, we don't have to keep searching the file once we find the first occurrence. Additionally, it allows us to save progress (we won't do that on this tutorial) for a search on a per-file / per-line basis... for example, technically we could "pause" a search on a file, and continue later - letting our script start at whatever line we paused on.

However, the code just looks to see if the text was found. In the next code block we actually make sure it skips to the next file. Check it out!
We also need to make sure we are saving the path somewhere, so we find the file later. We do this in the $found array.

[CODE lang="php" highlight="14-18,20-22,"]<?php
$path = "./"; // starting folder
$searchText = "test";
$find = array_values(array_diff(scandir($path), array(".", "..")));
$num = count($find);
$found = array();
for($x = 0; $x < $num; $x++){
if(realpath($path) . "/" . $find[$x] == __FILE__){ continue; } // the file currently being searched is this current file... no need to search start.php
if(is_file($find[$x])){
$f = @fopen($path . $find[$x], "r"); // use @ to suppress warnings
if($f){ // found file
$thisfound = false; // used with feof() below because we break out of file if we find searchtext, which then would cause feof() error due to not reaching end of line... $thisfound=true when found before EOF.
while (($bufferContent = fgets($f)) !== false) { // good practice to fetch the contents with a buffer / line by line
if(strpos($bufferContent, $searchText) !== false){ // if the text "test" is found in this line...
$found[$path][] = $find[$x]; // add to our array of files containing search text
$thisfound = true;
break; // stop looping each line of file, we already found it
}
}
if (!feof($f) && $thisfound == false) { // if we didn't find a match, and didn't reach end of file, it means there is unsearched text inside file
echo "Error: unexpected failure with fgets() on $find[$x]!<br>";
}
fclose($f);
}
} else if(is_dir($find[$x])){
echo $find[$x] . " is a directory<br>";
}
}
?>[/CODE]
So, I have now added quite a few more pieces of logic. You will notice that in the beginning of our loop we use realpath() to get the full system path of the directory we are searching. We then add the current file result ($find[$x]) to the end, and see if it matches __FILE__.. If it does, it basically means our script found itself. So, I have start.php in the directory I am searching, and that line of code with the continue; code will basically say "Hey start.php, there is no need to search yourself! Move on to the next file!"

I then finish off the logic for moving on from a file after finding the search text to avoid endlessly searching the file once a match is found. For this script, we don't care how many times the search text appears in the file... only if it does or does not. So, I use the $thisfound variable to be equal to true if we found the search text on a line. The variable defaults back to equaling false for the next file we search. This is because of the feof() function down below our while loop. Feof() checks to see if we have reached the end of the file. Because we are skipping to the next file once search text is found, files with the search text in it never show us their "end of file"... This causes our error to show up. However, the $thisfound variable lets us add some additional logic so that we can tell our script "Hey, we purposefully skipped to the next file without reaching the end of this current file, so don't display an error!"

I used print_r($found) at the end of my file to double check results. Sure enough, it correctly showed me each file (didn't list file more than once because of our $thisfound logic) that had the word "test" in it. Even if there were no spaces before or after the word test...

Now we need to make this all work recursively... so we will use a function to wrap up our file searching logic and directory searching logic...

Recursive PHP Function for File Systems
PHP:
<?php
$path = __DIR__; // starting folder
$searchText = "test";

$found = filescanner($path, $searchText);
print_r($found);

function filescanner($path, $search){
    $find = array_values(array_diff(scandir($path), array(".", "..")));
    $num = count($find);
    $found = array();
    for($x = 0; $x < $num; $x++){
        if(realpath($path) . "/" . $find[$x] == __FILE__){ continue; } // the file currently being searched is this current file... no need to search start.php
        if(is_file($path . "/" . $find[$x])){
            $f = @fopen($path . "/" . $find[$x], "r"); // use @ to suppress warnings
            if($f){ // found file
                $thisfound = false; // used with feof() below because we break out of file if we find searchtext, which then would cause feof() error due to not reaching end of line... $thisfound=true when found before EOF.
                while (($bufferContent = fgets($f)) !== false) { // good practice to fetch the contents with a buffer / line by line
                  if(strpos($bufferContent, $search) !== false){ // if the text "test" is found in this line...
                    $found[$path][] = $find[$x]; // add to our array of files containing search text
                    $thisfound = true;
                    break; // stop looping each line of file, we already found it
                  }
                }
                if (!feof($f) && $thisfound == false) { // if we didn't find a match, and didn't reach end of file, it means there is unsearched text inside file
                  echo "Error: unexpected failure with fgets() on $find[$x]!<br>";
                }
                fclose($f);
            }
        } else if(is_dir($path . "/" . $find[$x])){
            // we will search directories recursively in next step
        }
    }
    return $found;
}
?>

So, I have now put everything inside of the function filescanner() with parameters $path and $search.
I also changed our starting $path to be __DIR__, our current directory that start.php is located inside, which is the same as using "./" however it gives us full path info which helps with recursive & looping later.

Everything is the same, except it's inside of our function. So we declare that $found = filescanner($path, $searchText) in the start of our page to fetch results. Using print_r($found) will show us every path & file that contains our search text of 'test'. For example here is how mine looks:
Array ( [/home/wbr/public_html/projects/filescanner] => Array ( [0] => another-random-file.html [1] => randomfile.md ) [/home/wbr/public_html/projects/filescanner/fakefolder/testinganother] => Array ( [0] => blah.md ) [/home/wbr/public_html/projects/filescanner/fakefolder/testinganother/testtttt] => Array ( [0] => hello.txt ) [/home/wbr/public_html/projects/filescanner/fakefolder] => Array ( [0] => txt.txt ) )

Let's focus on the part that handles directories we find...
PHP:
if(is_dir($path . "/" . $find[$x])){
    $new = filescanner($path . "/" . $find[$x], $search);
    if(count($new) > 0){
        $found = array_merge($found, $new);
    }
}
As you can see here, we have some very basic logic to handle this if we come across another folder.
We simply say that $new = the results of our filescanner() function. We also make sure that we use our current $path variable, and then $find[$x] which is the newly discovered folder. We do this so that when filescanner() searches this new folder, it can handle any possible subfolders, and then any possible subfolders of those subfolders, etc etc. Then we say that if any results are found inside the folder, merge the found results ($new) with our $found array that is already created. This constantly allows us to modify our main results while still keeping everything organized. As you can see, the format of $found is that the key is the path to our folder, which holds an array of files matching our search text. This will allow us to list all files, with the correct paths attached.

Now, on the top of my file I am going to just quickly echo out our results, using the path and files arrays...
PHP:
$path = __DIR__; // starting folder
$searchText = "test";

$found = filescanner($path, $searchText);
$count = count($found);
foreach($found as $path=>$files){
    echo "<h4>$path</h4>";
    $count = count($files);
    for($x=0;$x<$count;$x++){
        echo $files[$x]. "<br>";
    }
    echo "<hr>";
}
print_r($found);

We basically just loop through and say that we want to show the path info in <h4> tags & then list the files below. We separate each searched directory by the <hr> tags.
Here is how it looks...
oQvRAQK.png


Easy to see, right? We can see that our recursive function was able to search through each directory, and sub-folder within, etc etc. You can see that by looking at the paths. I also double checked and sure enough, each file does in fact contain our search text of 'test' while files missing the text were not added to our results. Exactly what we want!


Convert File Path to URL
Now let's spruce it up a bit and make these files actual links!
PHP:
foreach($found as $path=>$files){
    echo "<h4>$path</h4>";
    $count = count($files);
    for($x=0;$x<$count;$x++){
        $linkpath = str_replace($_SERVER['DOCUMENT_ROOT'], '', $path) . "/" . $files[$x];
        echo "<a href='$linkpath'>$files[$x]</a> ($linkpath)<br>";
    }
    echo "<hr>";
}

I use str_replace() to remove the document root (path/to/user/folder/public_html) from our full path, to convert it into a link visitors can click.
I then make sure there is a trailing slash, with the file ($files[$x]) at the end. We end up with correct URLs to our found files, for each path!
Now our page looks like this:
64t2Uw1.png


As you can see, we can now click our files and see the destination link too.

From there, you could make a file viewer with a JavaScript/HTML/CSS library to highlight syntax & spot errors for different types of code.
However, that deserves an entirely different tutorial of its own. This tutorial is just to show you how I built a script capable of searching an entire file system for any text or code you need to find or modify.
 
Today I will teach you how to create a file scanner with PHP. You will be able to search every file for a specific value, such as a string of text. Perhaps you are looking for every file that contains your email address so you can change it... This would work nicely for that.
Wow this is awesome! Let's be honest we've all encountered the similar issue that this tool could have solved for us! :D

I'm going to follow along to this tutorial! Thanks for sharing :)
 
Wow this is awesome! Let's be honest we've all encountered the similar issue that this tool could have solved for us! :D

I'm going to follow along to this tutorial! Thanks for sharing :)
Yeah absolutely, I run into this issue a lot when I am looking for a specific file or image link I need to replace In a few files. I can also use it to find code I know is in an old project (that I need for something else) that I can't find easily. It could also be used in reverse, to find files missing certain text - for example, looking for files that don't have <title> tags or something.
 
This is core common PHP...
Sorry, let me rephrase.
I originally meant to help people with constructing their own functions or classes to search files and directories recursively.

Since writing this tutorial, I have upgraded my scanner to also identify line numbers and also perform search & replace.

There is a lot wrong with my filescanner here, but my intention with many of my PHP tutorials is to focus on a few key pieces of logic that can be explained line by line and avoid some of the more advanced or less used classes. For example, PHP looping, if statements, value and type checking, string searching, and other code in my tutorials are used a lot more than any single directory iterator. My hope is that this tutorial will allow for beginner PHP programmers to transition into more advanced programming easier than it may be if they also were reading through directory iterator documentation as well.
 
Back
Top Bottom