hpr4227 :: Introduction to jq - part 3
More filters
Hosted by Dave Morriss on Tuesday, 2024-10-15 is flagged as Explicit and is released under a CC-BY-SA license.
JSON, JavaScript Object Notation, jq, jq filter, jq language.
(Be the first).
The show is available on the Internet Archive at: https://archive.org/details/hpr4227
Listen in ogg,
spx,
or mp3 format. Play now:
Duration: 00:25:53
general.
Overview
In this episode we will continue looking at basic filters. Then we
will start looking at the feature that makes jq
very
powerful, the ability to transform JSON from one form to another. In
essence we can read and parse JSON and then construct an alternative
form.
More basic filters
Array/String Slice:
.[<number>:<number>]
This filter allows parts of JSON arrays or strings to be extracted.
The first number is the index of the elements of the array or string, starting from zero. The second number is an ending index, but it means "up to but not including". If the first index is omitted it refers to the start of the string or array. If the second index is blank it refers to the end of the string or array.
This example shows using an array and extracting part of it:
$ x="[$(seq -s, 1 10)]"
$ echo "$x"
[1,2,3,4,5,6,7,8,9,10]
$ echo "$x" | jq -c '.[3:6]'
[4,5,6]
Here we use the seq
command to generate the numbers 1-10
separated by commas in a JSON array. Feeding this to jq
on
its standard input with the slice request '.[3:6]
' results
in a sub-array from element 3 (containing value 4), up to but not
including element 6 (containing 7). Note that using the
'-c'
option generates compact output, as we discussed in
the last episode.
For a string, the idea is similar, as in:
$ echo '"Hacker Public Radio"' | jq '.[7:10]'
"Pub"
Notice that we provide the JSON string quotes inside single quotes
following echo
. The filter '.[7:10]
' starts
from element 7 (letter "P") up to but not including element 10
(letter "l").
Both of the numbers may be negative, meaning that they are offsets from the end of the array or string.
So, using '.[-7:-4]
' in the array example gives the same
result as '.[3:6]
', as do '.[3:-4]
' and
'.[-7:6]
'. This example uses the x
variable
created earlier:
$ for f in '.[-7:-4]' '.[3:6]' '.[3:-4]' '.[-7:6]'; do
> echo "$x" | jq -c "$f"
> done
[4,5,6]
[4,5,6]
[4,5,6]
[4,5,6]
Similarly, using '.[-12:-9]
' gives the same result as
'.[7:10]
' when used with the string.
$ echo '"Hacker Public Radio"' | jq '.[-12:-9]'
"Pub"
As a point of interest, I wrote a little Bash loop to show the positive and negative offsets of the characters in the test string - just to help me visualise them. See the footnote1 for details.
Finally, here is how to get the last character of the example string using positive and negative offsets:
$ echo '"Hacker Public Radio"' | jq '.[18:]'
"o"
$ echo '"Hacker Public Radio"' | jq '.[-1:]'
"o"
Array/Object Value Iterator:
.[]
This filter generates values from iterating through an array or an
object. It is similar to the .[index]
syntax we have
already seen, but it returns all of the array elements:
$ arr='["Kohinoor","plastered","downloadable"]'
$ echo "$arr" | jq '.[]'
"Kohinoor"
"plastered"
"downloadable"
The strings in the array are returned separately, not as an array. This is because this is an iterator, and its output can be linked to other filters.
It can also be used to iterate over values in an object:
$ obj='{"name": "Hacker Public Radio", "type": "Podcast"}'
$ echo "$obj" | jq '.[]'
"Hacker Public Radio"
"Podcast"
This iterator does not work on other data types, just arrays and objects.
An alternative iterator .[]?
exists which ignores
errors:
$ echo "true" | jq '.[]'
jq: error (at <stdin>:1): Cannot iterate over boolean (true)
Ignoring errors:
$ echo "true" | jq '.[]?'
Using multiple filters
There are two operators that can be placed between filters to combine
their effects: the comma (','
) and the
pipe ('|'
).
Comma operator
The comma (','
) operator allows you to chain together
multiple filters. As we already know, the jq
program feeds
the input it receives on standard input or from a file into whatever
filter it is given. So far we have only seen a single filter being
used.
With the comma operator the input to jq
is fed to all of
the filters separated by commas in left to right order. The result is a
concatenation of the output of all of these filters.
For example, if we take the output from the HPR stats page which was
mentioned in part 1
of this series of shows, and store it in a file called
stats.json
we can view two separate parts of the JSON like
this:
$ curl -s https://hub.hackerpublicradio.org/stats.json -O
$ jq '.shows , .queue' stats.json
{
"total": 4756,
"twat": 300,
"hpr": 4456,
"duration": 7640311,
"human_duration": "0 Years, 2 months, 29 days, 10 hours, 18 minutes and 31 seconds"
}
{
"number_future_hosts": 6,
"number_future_shows": 18,
"unprocessed_comments": 0,
"submitted_shows": 0,
"shows_in_workflow": 51,
"reserve": 20
}
This applies the filter .shows
(an object
identifier-index filter, see part 2)
which returns the contents of the object with that name, then it applies
filter .queue
which returns the relevant JSON object.
Pipe operator
The pipe ('|'
) operator combines filters by feeding the
output of the first (left-most) filter of a pair into the second
(right-most) filter of a pair. This is analogous to the way the same
symbol works in the Unix shell.
For example, if we extract the 'shows'
object from
stats.json
, we can then extract the value of the
total'
key' as follows:
$ jq '.shows | .total' stats.json
4756
Interestingly, chaining two object identifier-index filters gives the same output:
$ jq '.shows.total' stats.json
4756
(Note: to answer the question in the audio, the two filters shown can
also be written as '.shows .total'
with intervening
spaces.)
We will see the pipe operator being used in many instances in upcoming episodes.
Parentheses
It is possible to use parentheses in filter expressions in a similar way to using them in arithmetic, where they group parts together and can change the normal order of operations. They can be used in other contexts too. The example is a simple arithmetic one:
$ jq '.shows.total + 2 / 2' stats.json
4757
$ jq '(.shows.total + 2) / 2' stats.json
2379
Examples
Finding country data #1
Here we are using a file called countries.json
obtained
from the GitHub
project listed below. This file is around 39,000 lines long so it is
not being distributed with the show. However, it's quite interesting and
you are encouraged to grab a copy and experiment with it.
I will show ways in which the structure can be examined and reported
with jq
in a later show, but for now I will show an example
of extracting data:
$ jq '.[42] | .name.common , .capital.[]' countries.json
"Switzerland"
"Bern"
- The file contains an array of country objects; the one with index 42 is Switzerland.
- The name of the country is in an object called
"name"
, with the common name in a keyed field called"common"
, thus the filter.name.common
. - In this country object is an object called
"capital"
holding an array containing the name (or names) of the capital city (or cities). The filter.capital.[]
obtains and displays the contents of the array. - Note that we used a
comma
operator between the filters.
Finding country data #2
Another search of the countries.json
file, this time
looking at the languages spoken. There is an object called
"languages"
which contains abbreviated language names as
keys and full names as the values:
$ jq '.[42] | .name.common , .capital.[] , .languages' countries.json
"Switzerland"
"Bern"
{
"fra": "French",
"gsw": "Swiss German",
"ita": "Italian",
"roh": "Romansh"
}
- Using the filter
.languages
we get the whole object, however, using the iterator.[]
we get just the values.
$ jq '.[42] | .name.common , .capital.[] , .languages.[]' countries.json
"Switzerland"
"Bern"
"French"
"Swiss German"
"Italian"
"Romansh"
- This has some shortcomings, we need the construction
capabilities of
jq
to generate more meaningful output.
Next episode
In the next episode we will look at construction - how new JSON output data can be generated from input data.
Links
- jq:
- Test data sources:
- Previous episodes: