I was thinking of other operating systems actually. I saw you were changing \ to /, so I thought you'd want this to run on Windows too. I don't know if Windows allows such filenames; I'm no Windows expert. I don't know what a valid pathname on OS X is, either. Or say VMS or some arcane OS. (Relying on / as the pathname delimiter likely eliminates a lot of OSes, but anyways...)
I have two choices: I can become an expert at what filenames are valid; or I can set an artificial limit and only allow simple letters and numbers, which I KNOW are valid pretty much everywhere. There are good and bad things about each way of doing it. One good thing is that my method lets me be lazy. But I see my regex excluded underscores for example, which is probably overly strict.
Even though Unix filesystems technically allow such things, I would be wary of anyone actually using files named $%@#$%^. From a readability perspective if from nothing else.
I only mention that because I am aware that such a vulnerability exists in Perl, because the limit for the max size of POST data is not set to a sane value by default in the CGI module. Or such a vulnerability existed a few years ago. And actually I think the vulnerability would only lead to denial of service attacks and not buffer overflows or anything, so my point is probably entirely irrelevant now that I think about it.People can send large amounts of post data, but the quantity is limited by PHP configuration settings.
post_max_size - Sets the maximum size of post data PHP will accept before bailing
max_input_time - Maximum time PHP will spend parsing post data
If 50MB would cause it to overflow, I would seriously lose all faith in those programming PHP. I don't think they would be foolish enough to provide open source code with any such overflow vulnerabilities.
Imagine you're including a file in $DOCUMENT_ROOT/include/samuraid. You possibly wouldn't want to tell people "hey look, samuraid!". It gives them your username, which may be the same username you use to log in to all sorts of nasty things. It's potentially not a problem with your script, only a problem with the person who named that directory. And it could be argued that picking stupid values for the keys into my array is just as bad as picking stupid filenames, and I'm only moving the same problem up a level of abstraction.Well, the only way this script could be reasonably run would be within the web root directory anyway, so only webspace files could ever be included. As far as I can tell, there isn't a way to back up in the directory structure and get into any parent directories or files. Also, this is actually an abbreviated version of the script. (that I edited together quickly) The full version actually auto-appends the .php extension and will only include files of .php type. I simply provided a stripped down version here with the assumption that anyone using it would add and modify to fit the need.![]()
But I prefer not to give users enough rope to hang themselves with, especially when there's no reason why you SHOULD tell people your pathnames. And abstracting the filenames at least gives you a way to make it safe without potentially breaking something underlying; the abstraction is entirely specific to my script. It's only security through obscurity, and you shouldn't rely ONLY on such things, but a bit of obscurity surely never hurt anyone.
(Is there also maybe the chance that a server that supports symlinks could cause some evil possibilities? I don't know if is_file() returns true or false for symlinks, so this may not matter either.)
My other point is that "I don't know of any way this information could be used" is limited by your knowledge; there may be ways to use it that you aren't aware of. I think it's always good to err on the side of caution. Imagine you're running another script that has a vulnerability where it lets people access files, but only if they can guess an exact arcane pathname in a certain single directory, and that it's really hard to guess those pathnames. And then you have your script, which tells people pathnames but doesn't give them any hole through which to access them. The two together could cause you problems, even if each alone is theoretically safe.
I'm sure you're aware of all of these things and I'm not telling you anything you don't already know, but I like hearing myself type.And other people may be reading.
I don't see any way people could use your script in an evil way either, given a sane web server. But I'm always paranoid (over-paranoid) about what I may be missing.
Yeah, I saw that. I misspoke.Just as a note: my code only limits files and directories that start with a ".", any .'s later in the filename are allowed.