hpr4417 :: Newest matching file

Writing a function or script to find a given file

Hosted by Dave Morriss on Tuesday, 2025-07-08 is flagged as Explicit and is released under a CC-BY-SA license.
Tags: Bash, function, Perl, find. Comments: 2.

Listen in ogg, opus, or mp3 format. Play now:

Duration: 00:22:02
Download the transcription and subtitles.

Part of the series: general.

Overview

Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.

At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.

Since then, I have found that searching for newest files useful in many contexts:

Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.
Generate a weather report from wttr.in and send it to Matrix.
Find the screenshot I just made and put it in the clipboard.

Of course, I could just use the same name when writing these various files, rather than accumulating several, but I often want to look back through such collections. If I am concerned about such files accumulating in an unwanted way I write cron scripts which run every day and delete the oldest ones.

Original script

The first iteration of the script was actually written as a Bash function which was loaded at login time. The function is called newest_matching_file and it takes two arguments:

A file glob expression to match the file I am looking for.
An optional directory to look for the file. If this is omitted, then the current directory will be used.

The first version of this function was a bit awkward since it used a for loop to scan the directory, using the glob pattern to find the file. Since Bash glob pattern searches will return the search pattern when they fail, it was necessary to use the nullglob (see references) option to prevent this, turning it on before the search and off afterwards.

This technique was replaced later with a pipeline using the find command.

Improved Bash script

The version using find is what I will explain here.


function newest_matching_file {
    local glob_pattern=${1-}
    local dir=${2:-$PWD}

    # Argument number check
    if [[ $# -eq 0 || $# -gt 2 ]]; then
        echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2
        return 1
    fi

    # Check the target directory
    if [[ ! -d $dir ]]; then
        echo "Unable to find directory $dir" >&2
        return 1
    fi

    local newest_file

    # shellcheck disable=SC2016
    newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \
        -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}')

    # Use printf instead of echo in case the file name begins with '-'
    [[ -n $newest_file ]] && printf '%s\n' "$newest_file"

    return 0
}

The function is in the file newest_matching_file_1.sh , and it's loaded ("sourced", or declared) like this:


. newest_matching_file_1.sh

The '.' is a short-hand version of the command source .

I actually have two versions of this function, with the second one using a regular expression, which the find command is able to search with, but I prefer this one.

Explanation

The first two lines beginning with local define variables local to the function holding the arguments. The first, glob_pattern is expected to contain something like screenshot_2025-04-*.png . The second will hold the directory to be scanned, or if omitted, will be set to the current directory.
Next, an if statement checks that there are the right number of arguments, aborting if not. Note that the echo command writes to STDERR (using '>&2' ), the error channel.
Another if statement checks that the target directory actually exists, and aborts if not.
Another local variable newest_file is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment.
The variable newest_file is set to the result of a command substitution containing a pipeline:
- The find command searches the target directory.
  - Using -maxdepth 1 limits the search to the chosen directory and does not descend into sub-directories.
  - The search pattern is defined by -name "$glob_pattern"
  - Using -type f limits the search to files
  - The -printf "%T@ %p\n" argument returns the file's last modification time as the number of seconds since the Unix epoch '%T@' . This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ('%p' ), and a newline.
- The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
- Finally sed is used to return the last file in the sorted list with the program '${s/.\+ //;p}' :
  - The use of the -n option ensures that only lines which are explicitly printed will be shown.
  - The sed program looks for the last line (using '$' ). When found the leading numeric time is removed with 's/.\+ //' and the result is printed (with 'p' ).
- The end result will either be the path to the newest file or nothing (because there was no match).
The expression '[[ -n $newest_file ]]' will be true if $newest_file variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed.
Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.

Script update

While editing the audio for this show I realised that there is a flaw in the Bash function newest_matching_file . This is in the sed script used to process the output from find .

The sed commands used in the script delete all characters up to a space, assuming that this is the only space in the last line. However, if the file name itself contains spaces, this will not work because regular expressions in sed are greedy . What is deleted in this case is everything up to and including the last space.

I created a directory called tests and added the following files:


'File 1 with spaces.txt'
'File 2 with spaces.txt'
'File 3 with spaces.txt'

I then ran the find command as follows:


$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}'
spaces.txt

I adjusted the sed call to sed -ne '${s/[^ ]\+ //;p}' . This uses the regular expression:


s/[^ ]\+ //

This now specifies that what it to be removed is every non-space up to and including the first space. The result is:


$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}'
tests/File 3 with spaces.txt

This change has been propagated to the copy on GitLab .

Usage

This function is designed to be used in commands or other scripts.

For example, I have an alias defined as follows:


alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"

This uses xclip to load the latest screenshot into the clipboard, so I can paste it into a social media client for example.

Perl alternative

During the history of this family of scripts I wrote a Perl version. This was originally because the Bash function gave problems when run under the Bourne shell, and I was using pdmenu a lot which internally runs scripts under that shell.


#!/usr/bin/env perl

use v5.40;

use open ':std', ':encoding(UTF-8)';    # Make all IO UTF-8

use Cwd;
use File::Find::Rule;

#
# Script name
#
( my $PROG = $0 ) =~ s|.*/||mx;

#
# Use a regular expression rather than a glob pattern
#
my $regex = shift;

#
# Get the directory to search, defaulting to the current one
#
my $dir = shift // getcwd();

#
# Have to have the regular expression
#
die "Usage: $PROG regex [DIR]\n" unless $regex;

#
# Collect all the files in the target directory without recursing. Include the
# path and let the caller remove it if they want.
#
my @files =  File::Find::Rule->file()
    ->name(qr/$regex/)
    ->maxdepth(1)
    ->in($dir);

die "Unsuccessful search\n" unless @files;

#
# Sort the files by ascending modification time, youngest first
#
@files = sort {-M($a) <=> -M($b)} @files;

#
# Report the one which sorted first
#
say $files[0];

exit;

Explanation

This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it - perl .
The preamble defines the Perl version to use, and indicates that UTF-8 (character sets like Unicode) will be acceptable for reading and writing.
Two modules are required:
- Cwd : provides functions for determining the pathname of the current working directory.
- File::Find::Rule : provides tools for searching the file system (similar to the find command, but with more features).
Next the variable $PROG is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage.
The first argument is then collected (with shift ) and placed into the variable $regex .
The second argument is optional, but if omitted, is set to the current working directory. We see the use of shift again, but if this returns nothing (is undefined), the '//' operator invokes the getcwd() function to get the current working directory.
If the $regex variable is not defined, then die is called to terminate the script with an error message.
The search itself is invoked using File::Find::Rule and the results are added to the array @files . The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:
- file() : sets up a file search
- name(qr/$regex/) : a rule which applies a regular expression match to each file name, rejecting any that do not match
- maxdepth(1) : a rule which prevents the search from descending below the top level into sub-directories
- in($dir) : defines the directory to search (and also begins the search)
If the search returns no files (the array is empty), the script ends with an error message.
Otherwise the @files array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The <=> operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perl sort function.
Finally, the newest file is reported.

Usage

This script can be used in almost the same way as the Bash variant. The difference is that the pattern used to match files is a Perl regular expression. I keep this script in my ~/bin directory, so it can be invoked just by typing its name. I also maintain a symlink called nmf to save typing!

The above example, using the Perl version, would be:


alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"

In regular expressions '.*' means "any character zero or more times". The '.' in '.png' is escaped because we need an actual dot character.

Conclusion

The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.

This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.

References

Glob expansion:
- Wikipedia article on glob patterns

HPR shows covering glob expansion:
- Finishing off the subject of expansion in Bash (part 1)
- Finishing off the subject of expansion in Bash (part 2)

GitLab repository holding these files:
- hprmisc - Miscellaneous scripts, notes, etc pertaining to HPR episodes which I have contributed

Comments

Comment #1 posted on 2025-06-23 09:20:21 by Archer72

It's in my memory

Dave,

Thanks for the show. This is one that I will have to refer to at a later date. For now, I'm putting in my memory to refer back.

Archer72

Comment #2 posted on 2025-07-08 11:43:24 by ToeJet

Alternate method

I use a similar pattern to find newest and oldest
PATTERN is any valid filename pattern and path if needed
PATTERN=*.txt #Text file in current direcory
PATTERN=/dir1/*.sh #sh file in /dir1
PATTERN=/dir2/[a-j]* #file beginning with a-j in /dir2

#Find Newest
NewFile=$(dir -1tQ ${PATTERN} | head -n 1)
NewFile=${NewFile//\"/} #Remove " in Name

#Find Oldest
OldFile=$(dir -1rtQ ${PATTERN} | head -n 1)
OldFile=${OldFile//\"/} #Remove " in Name

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?

HPR

Hacker Public Radio

https://HackerPublicRadio.org

The Community Podcast

Sharing your ideas, projects, opinions since 2005

New episodes every weekday

hpr4417 :: Newest matching file

Writing a function or script to find a given file

Part of the series: general.

Overview

Original script

Improved Bash script

Explanation

Script update

Usage

Perl alternative

Explanation

Usage

Conclusion

References

Comments

Comment #1 posted on 2025-06-23 09:20:21 by Archer72

It's in my memory

Comment #2 posted on 2025-07-08 11:43:24 by ToeJet

Alternate method

Leave Comment