Avro Command Line Tool


Last Updated on Sep 28, 2022

In the last few examples, we saw how to use the Avro package to encode/decode data in a python application. But it is also possible to simply use a command line tool for testing and validation purposes.

In this part, we will explore using the avro CLI tool and interacting with data.

Note: Each language implementation comes with its own CLI tool implementation. In this part, we will be using the avro tool made available as part of the avro Python package.

The Avro Python CLI helps you write and read Avro encoded data to and from binary files. It has two primary options to interact with encoded data:

  • cat to decode and view data in the file
  • write to encode and add data to a file

The help option displays the full format:

!avro --help
Usage: avro cat|write [options] FILE [FILE...]
Display/write for Avro files
Options:
--version show program's version number and exit
-h, --help show this help message and exit
cat options:
-n COUNT, --count=COUNT
number of records to print
-s SKIP, --skip=SKIP
number of records to skip
-f FORMAT, --format=FORMAT
record format
--header print CSV header
--filter=FILTER filter records (e.g. r['age']>1)
--print-schema print schema
--fields=FIELDS fields to show, comma separated (show all by default)
write options:
--schema=SCHEMA schema file (required)
--input-type=INPUT_TYPE
input file(s) type (json or csv)
-o OUTPUT, --output=OUTPUT
output file

Let us explore the write option first and understand how to encode and write data to a file with the CLI.

Schema

The write option accepts three parameters:

  • schema: contains the writer's schema in JSON format. A sample Person schema, stored in person-schema.json for example, would look like this:
cat person-schema.json
{
"type": "record",
"name": "Person",
"fields": [
{ "name": "identifier", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": ["null", "string"], "default": null },
{ "name": "email", "type": "string" },
{ "name": "interests", "type": { "type": "array", "items": "string" } }
]
}
  • input-type: specifies whether the input data file is in JSON or CSV format. When not explicitly specified, the input type will be detected from the file extension. A sample input file would like this:
%cat people.json
{"identifier": 1,"firstName": "John","lastName": "Doe","email": "[email protected]","interests": ["Hunting", "Foraging", "Cooking"]}
{"identifier": 2,"firstName": "Jane","lastName": "Doe","email": "[email protected]","interests": ["Hygiene", "Hiking"]}
  • output: contains the name of the file to which the encoded data is written

You can invoke the write command easily with these artifacts now:

!avro write --schema person.json --input-type json -o people-through-cli.avro people.json

Data in people.json is encoded and written into people-through-cli.avro. To verify, let us use the cat command and read the contents:

!avro cat people-through-cli.avro
{"identifier": 1, "firstName": "John", "lastName": "Doe", "email": "[email protected]", "interests": ["Hunting", "Foraging", "Cooking"]}
{"identifier": 2, "firstName": "Jane", "lastName": "Doe", "email": "[email protected]", "interests": ["Hygiene", "Hiking"]}

Note: If you don't supply the --output/-o option, the tool will generate and dump the binary encoded data in the command line interface itself.

Additional capabilities

Apart from being able to view data in the encoded file, Avro CLI also provides additional options to narrow down the resultset.

Filtering:

You can filter records with simple boolean expressions on fields:

!avro cat people-through-cli.avro --filter "r['email']=='[email protected]'"
{"identifier": 1, "firstName": "John", "lastName": "Doe", "email": "[email protected]", "interests": ["Hunting", "Foraging", "Cooking"]}

View Writer's schema:

You can view the schema using which the data was encoded with the print-schema cat option:

!avro cat --print-schema people-through-cli.avro
{
"type": "record",
"name": "Person",
"fields": [
{
"type": "int",
"name": "identifier"
},
{
"type": "string",
"name": "firstName"
},
{
"type": [
"null",
"string"
],
"name": "lastName",
"default": null
},
{
"type": "string",
"name": "email"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "interests"
}
]
}

Format decoded data

You can use the format option to format decoded data into JSON, CSV, or even Pretty-Printed JSON.

!avro cat --format json-pretty people-through-cli.avro
{
"email": "[email protected]",
"firstName": "John",
"identifier": 1,
"interests": [
"Hunting",
"Foraging",
"Cooking"
],
"lastName": "Doe"
}
{
"email": "[email protected]",
"firstName": "Jane",
"identifier": 2,
"interests": [
"Hygiene",
"Hiking"
],
"lastName": "Doe"
}

View selected fields

You can restrict the fields in the results with the fields option:

!avro cat --fields email,firstName people-through-cli.avro
{"email": "[email protected]", "firstName": "John"}
{"email": "[email protected]", "firstName": "Jane"}

© 2022 Ambitious Systems. All Rights Reserved.