Skip to content

UnpackAvro

Unpacks the first selected embedded-schema Avro object container file into schema, container metadata, and decoded record content.

Parameters

NameDescriptionAllowed ValuesRequiredDefault
containerMetadataFilenameSuffixSuffix used when includeContainerMetadataFile is truestring.container-metadata.json
contentIndexesList of content indexes to include or excludeinteger (list)
contentSuffixSuffix used when recordOutputMode is NDJSONstring.content.ndjson
contentTagsList of content tags to include or exclude, matching anystring (list)
excludeContentIndexesExclude specified content indexesbooleanfalse
excludeContentTagsExclude specified content tagsbooleanfalse
excludeFilePatternsExclude specified file patternsbooleanfalse
excludeMediaTypesExclude specified media typesbooleanfalse
filePatternsList of file patterns to include or exclude, supporting wildcards (*)string (list)
includeContainerMetadataFileWhether to write container metadata as a JSON contentbooleantrue
includeSchemaFileWhether to write the Avro schema as a JSON contentbooleanfalse
individualRecordSeparatorSeparator inserted before the numbered file index when recordOutputMode is INDIVIDUAL_JSONstring.
individualRecordSuffixSuffix used for each numbered record file when recordOutputMode is INDIVIDUAL_JSONstring.json
jsonArraySuffixSuffix used when recordOutputMode is JSON_ARRAYstring.content.json
mediaTypesList of media types to consider, supporting wildcards (*)string (list)
recordOutputModeHow decoded Avro records should be writtenNDJSON
JSON_ARRAY
INDIVIDUAL_JSON
NDJSON
retainExistingContentRetain the existing contentbooleanfalse
schemaFilenameSuffixSuffix used when includeSchemaFile is truestring.schema.json

Input

Content

Input content to act on may be selected (or inversely selected using the exclude parameters) with contentIndexes, mediaTypes, and/or filePatterns. Selected content must be an Avro object container file with an embedded schema.

If multiple contents are selected, only the first selected content is unpacked. To unpack multiple Avro files independently, split the DeltaFile before this action.

This action does not support raw binary Avro that relies on an external schema, or Confluent-style wire formats that carry a schema registry identifier instead of a full embedded Avro container header.

Output

Content

The first selected Avro content is replaced by derived contents unless retainExistingContent is true.

Additional selected or unselected content is passed through unchanged.

By default, decoded record content and container metadata are written.

When enabled, the schema is written to <base>.schema.json.

When enabled, container metadata is written to <base>.container-metadata.json as a flat JSON map of custom Avro header metadata keys to Base64 values.

Decoded Avro records can be written either as:

  • a single newline-delimited JSON content when recordOutputMode is NDJSON
  • a single JSON array content when recordOutputMode is JSON_ARRAY
  • multiple numbered JSON contents when recordOutputMode is INDIVIDUAL_JSON

The default numbered file pattern is <base>.1.json, <base>.2.json, etc.

The DeltaFile metadata is also updated with avroRecordCount, and with avroCodec when present in the Avro container header.

New content names are built from the input content name with a trailing '.avro' removed when present, then the configured schema/container metadata suffixes, contentSuffix, or numbered JSON naming is applied.

Errors

  • On selected content that is not a readable embedded-schema Avro object container file

Contact US