hpr4417 :: Newest matching file
Writing a function or script to find a given file
Hosted by Dave Morriss on Tuesday, 2025-07-08 is flagged as Explicit and is released under a CC-BY-SA license.
Bash, function, Perl, find.
1.
Listen in ogg,
opus,
or mp3 format. Play now:
Duration: 00:22:02
Download the transcription and
subtitles.
general.
Overview
Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.
At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.
Since then, I have found that searching for newest files useful in many contexts:
-
Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.
-
Generate a weather report from
wttr.in
and send it to Matrix. -
Find the screenshot I just made and put it in the clipboard.
Of course, I could just use the same name when writing these various
files, rather than accumulating several, but I often want to look back
through such collections. If I am concerned about such files
accumulating in an unwanted way I write
cron
scripts which
run every day and delete the oldest ones.
Original script
The first iteration of the script was actually written as a Bash
function which was loaded at login time. The function is called
newest_matching_file
and it takes two arguments:
-
A file glob expression to match the file I am looking for.
-
An optional directory to look for the file. If this is omitted, then the current directory will be used.
The first version of this function was a bit awkward since it used a
for
loop to scan the directory, using the glob pattern to
find the file. Since Bash glob pattern searches will return the search
pattern when they fail, it was necessary to use the
nullglob
(see references) option to prevent this, turning
it on before the search and off afterwards.
This technique was replaced later with a pipeline using the
find
command.
Improved Bash script
The version using
find
is what I will explain here.
function newest_matching_file {
local glob_pattern=${1-}
local dir=${2:-$PWD}
# Argument number check
if [[ $# -eq 0 || $# -gt 2 ]]; then
echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2
return 1
fi
# Check the target directory
if [[ ! -d $dir ]]; then
echo "Unable to find directory $dir" >&2
return 1
fi
local newest_file
# shellcheck disable=SC2016
newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \
-type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}')
# Use printf instead of echo in case the file name begins with '-'
[[ -n $newest_file ]] && printf '%s\n' "$newest_file"
return 0
}
The function is in the file
newest_matching_file_1.sh
,
and it's loaded ("sourced", or declared) like this:
. newest_matching_file_1.sh
The
'.'
is a short-hand version of the command
source
.
I actually have two versions of this function, with the second one
using a regular expression, which the
find
command is able
to search with, but I prefer this one.
Explanation
-
The first two lines beginning with
local
define variables local to the function holding the arguments. The first,glob_pattern
is expected to contain something likescreenshot_2025-04-*.png
. The second will hold the directory to be scanned, or if omitted, will be set to the current directory. -
Next, an
if
statement checks that there are the right number of arguments, aborting if not. Note that theecho
command writes to STDERR (using'>&2'
), the error channel. -
Another
if
statement checks that the target directory actually exists, and aborts if not. -
Another local variable
newest_file
is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment. -
The variable
newest_file
is set to the result of a command substitution containing a pipeline:-
The
find
command searches the target directory.-
Using
-maxdepth 1
limits the search to the chosen directory and does not descend into sub-directories. -
The search pattern is defined by
-name "$glob_pattern"
-
Using
-type f
limits the search to files -
The
-printf "%T@ %p\n"
argument returns the file's last modification time as the number of seconds since the Unix epoch'%T@'
. This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ('%p'
), and a newline.
-
Using
- The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
-
Finally
sed
is used to return the last file in the sorted list with the program'${s/.\+ //;p}'
:-
The use of the
-n
option ensures that only lines which are explicitly printed will be shown. -
The
sed
program looks for the last line (using'$'
). When found the leading numeric time is removed with 's/.\+ //'
and the result is printed (with'p'
).
-
The use of the
- The end result will either be the path to the newest file or nothing (because there was no match).
-
The
-
The expression
'[[ -n $newest_file ]]'
will be true if$newest_file
variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed. -
Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.
Script update
While editing the audio for this show I realised that there is a flaw
in the Bash function
newest_matching_file
. This is in the
sed
script used to process the output from
find
.
The
sed
commands used in the script delete all
characters up to a space, assuming that this is the only space in the
last line. However, if the file name itself contains spaces, this will
not work because regular expressions in
sed
are
greedy
. What is deleted in this case is everything up to and
including the
last
space.
I created a directory called
tests
and added the
following files:
'File 1 with spaces.txt'
'File 2 with spaces.txt'
'File 3 with spaces.txt'
I then ran the
find
command as follows:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}'
spaces.txt
I adjusted the
sed
call to
sed -ne '${s/[^ ]\+ //;p}'
. This uses the regular
expression:
s/[^ ]\+ //
This now specifies that what it to be removed is every non-space up to and including the first space. The result is:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}'
tests/File 3 with spaces.txt
This change has been propagated to the copy on
GitLab
.
Usage
This function is designed to be used in commands or other scripts.
For example, I have an alias defined as follows:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"
This uses
xclip
to load the latest screenshot into the
clipboard, so I can paste it into a social media client for example.
Perl alternative
During the history of this family of scripts I wrote a Perl version.
This was originally because the Bash function gave problems when run
under the Bourne shell, and I was using
pdmenu
a lot which
internally runs scripts under that shell.
#!/usr/bin/env perl
use v5.40;
use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8
use Cwd;
use File::Find::Rule;
#
# Script name
#
( my $PROG = $0 ) =~ s|.*/||mx;
#
# Use a regular expression rather than a glob pattern
#
my $regex = shift;
#
# Get the directory to search, defaulting to the current one
#
my $dir = shift // getcwd();
#
# Have to have the regular expression
#
die "Usage: $PROG regex [DIR]\n" unless $regex;
#
# Collect all the files in the target directory without recursing. Include the
# path and let the caller remove it if they want.
#
my @files = File::Find::Rule->file()
->name(qr/$regex/)
->maxdepth(1)
->in($dir);
die "Unsuccessful search\n" unless @files;
#
# Sort the files by ascending modification time, youngest first
#
@files = sort {-M($a) <=> -M($b)} @files;
#
# Report the one which sorted first
#
say $files[0];
exit;
Explanation
-
This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it -
perl
. -
The preamble defines the Perl version to use, and indicates that
UTF-8
(character sets like Unicode) will be acceptable for reading and writing. -
Two modules are required:
-
Cwd
: provides functions for determining the pathname of the current working directory. -
File::Find::Rule
: provides tools for searching the file system (similar to thefind
command, but with more features).
-
-
Next the variable
$PROG
is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage. -
The first argument is then collected (with
shift
) and placed into the variable$regex
. -
The second argument is optional, but if omitted, is set to the current working directory. We see the use of
shift
again, but if this returns nothing (is undefined), the'//'
operator invokes thegetcwd()
function to get the current working directory. -
If the
$regex
variable is not defined, thendie
is called to terminate the script with an error message. -
The search itself is invoked using
File::Find::Rule
and the results are added to the array@files
. The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:-
file()
: sets up a file search -
name(qr/$regex/)
: a rule which applies a regular expression match to each file name, rejecting any that do not match -
maxdepth(1)
: a rule which prevents the search from descending below the top level into sub-directories -
in($dir)
: defines the directory to search (and also begins the search)
-
-
If the search returns no files (the array is empty), the script ends with an error message.
-
Otherwise the
@files
array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The<=>
operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perlsort
function. -
Finally, the newest file is reported.
Usage
This script can be used in almost the same way as the Bash variant.
The difference is that the pattern used to match files is a Perl regular
expression. I keep this script in my
~/bin
directory, so it
can be invoked just by typing its name. I also maintain a symlink called
nmf
to save typing!
The above example, using the Perl version, would be:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"
In regular expressions
'.*'
means "any character zero or
more times". The
'.'
in
'.png'
is escaped
because we need an actual dot character.
Conclusion
The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.
This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.
References
- Glob expansion:
-
HPR shows covering
glob
expansion:
- GitLab repository holding these files: