Avro Command Line Tool
In the last few examples, we saw how to use the Avro package to encode/decode data in a python application. But it is also possible to simply use a command line tool for testing and validation purposes.
In this part, we will explore using the avro
CLI tool and interacting with data.
Note: Each language implementation comes with its own CLI tool implementation. In this part, we will be using the avro
tool made available as part of the avro
Python package.
The Avro Python CLI helps you write and read Avro encoded data to and from binary files. It has two primary options to interact with encoded data:
cat
to decode and view data in the filewrite
to encode and add data to a file
The help
option displays the full format:
!avro --help
Let us explore the write
option first and understand how to encode and write data to a file with the CLI.
Schema
The write
option accepts three parameters:
schema
: contains the writer's schema in JSON format. A samplePerson
schema, stored inperson-schema.json
for example, would look like this:
cat person-schema.json
input-type
: specifies whether the input data file is in JSON or CSV format. When not explicitly specified, the input type will be detected from the file extension. A sample input file would like this:
%cat people.json
output
: contains the name of the file to which the encoded data is written
You can invoke the write command easily with these artifacts now:
!avro write --schema person.json --input-type json -o people-through-cli.avro people.json
Data in people.json
is encoded and written into people-through-cli.avro
. To verify, let us use the cat
command and read the contents:
!avro cat people-through-cli.avro
Note: If you don't supply the --output/-o
option, the tool will generate and dump the binary encoded data in the command line interface itself.
Additional capabilities
Apart from being able to view data in the encoded file, Avro CLI also provides additional options to narrow down the resultset.
Filtering:
You can filter records with simple boolean expressions on fields:
View Writer's schema:
You can view the schema using which the data was encoded with the print-schema
cat
option:
!avro cat --print-schema people-through-cli.avro
Format decoded data
You can use the format
option to format decoded data into JSON, CSV, or even Pretty-Printed JSON.
!avro cat --format json-pretty people-through-cli.avro
View selected fields
You can restrict the fields in the results with the fields
option:
!avro cat --fields email,firstName people-through-cli.avro