Combining the Jackson Streaming API with ObjectMapper for parsing JSON
The Jackson Streaming API allows us to parse huge JSON documents without loading their whole content in memory at once. It is the most efficient way to process JSON content and has the lowest memory and processing overhead, but it comes with a cost: is not the most convenient way to process JSON content.
In this post we’ll see how to take advantage of the Jackson Streaming API without losing the powerful capabilities of data binding provided by
ObjectMapper
.
This post in heavy on examples and the whole code is available on GitHub.
Introduction #
For demonstration purposes, let’s consider we want to parse the JSON array where each element represents a contact:
[
{
"id": 1,
"firstName": "John",
"lastName": "Doe",
"emails": [
"[email protected]"
],
"createdDateTime": "2019-08-19T20:30:00Z"
},
{
"id": 2,
"firstName": "Jane",
"lastName": "Poe",
"emails": [
"[email protected]",
"[email protected]"
],
"createdDateTime": "2019-08-19T20:45:00Z"
}
]
Each contact can be mapped to an instance of Contact
, which is defined as follows:
@Data
public class Contact {
private Integer id;
private String firstName;
private String lastName;
private List<String> emails;
private OffsetDateTime createdDateTime;
}
In most of applications, we can take advantage of the data binding capabilities provided by
ObjectMapper
and parse the array with the following code:
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JavaTimeModule());
List<Contact> contacts = mapper.readValue(json, new TypeReference<List<Contact>>() {});
However, in situations where we may have a couple of millions of elements in the array, we may not be able to hold all data in memory. So we need to fallback to the Jackson Streaming API.
The Jackson Streaming API was inspired in StAX, an event-based API for processing XML documents. Unlike StAX, Jackson uses the term token instead of event, which better reflects the JSON structure.
The main types of the Jackson Streaming API are:
Type | Description |
---|---|
JsonParser | Logical cursor for iterating over tokens, providing low level JSON reader capabilities |
JsonGenerator | Low level JSON writer |
JsonFactory | Factory for creating instances of
JsonParser and
JsonGenerator |
When using streaming, the content to read (and write) has to be processed in the exact same order as input comes in (or output is to go out). Having said that, it’s important to mention that random access is only provided by data binding and tree model APIs, which both actually use the streaming API under the hood for reading and writing JSON documents.
JsonParser
#
JsonParser
is used to parse JSON content into tokens along with its associated data. It is the lowest level of read access to JSON content in Jackson.
To iterate the stream of tokens, the application advances the cursor by calling the
nextToken()
method. And to access data and properties of the token cursor points to, the application calls one of accessors which will refer to property of the currently pointed-to token.
The
JsonParser
only keeps track of the data that the cursor currently points to (and just a little bit of context information for nesting, input line numbers and such).
Parsing JSON with JsonParser
#
Let’s see how to parse the JSON document shown above with
JsonParser
:
private void parseJson(InputStream is) throws IOException {
// Create a factory for creating a JsonParser instance
JsonFactory jsonFactory = new JsonFactory();
// Create a JsonParser instance
try (JsonParser jsonParser = jsonFactory.createParser(is)) {
// Check the first token
if (jsonParser.nextToken() != JsonToken.START_ARRAY) {
throw new IllegalStateException("Expected content to be an array");
}
// Iterate over the tokens until the end of the array
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
// Read a contact and do something with it
Contact contact = readContact(jsonParser);
doSomethingWithContact(contact);
}
}
}
private Contact readContact(JsonParser jsonParser) throws IOException {
// Check the first token
if (jsonParser.currentToken() != JsonToken.START_OBJECT) {
throw new IllegalStateException("Expected content to be an object");
}
Contact contact = new Contact();
// Iterate over the properties of the object
while (jsonParser.nextToken() != JsonToken.END_OBJECT) {
// Get the current property name
String property = jsonParser.getCurrentName();
// Move to the corresponding value
jsonParser.nextToken();
// Evaluate each property name and extract the value
switch (property) {
case "id":
contact.setId(jsonParser.getIntValue());
break;
case "firstName":
contact.setFirstName(jsonParser.getText());
break;
case "lastName":
contact.setLastName(jsonParser.getText());
break;
case "emails":
List<String> emails = readEmails(jsonParser);
contact.setEmails(emails);
break;
case "createdDateTime":
contact.setCreatedDateTime(OffsetDateTime.parse(jsonParser.getText()));
break;
// Unknown properties are ignored
}
}
return contact;
}
private List<String> readEmails(JsonParser jsonParser) throws IOException {
// Check the first token
if (jsonParser.currentToken() != JsonToken.START_ARRAY) {
throw new IllegalStateException("Expected content to be an object");
}
List<String> emails = new ArrayList<>();
// Iterate over the tokens until the end of the array
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
// Add each element of the array to the list of emails
emails.add(jsonParser.getText());
}
return emails;
}
Well, it’s an efficient way to parse JSON content in terms of memory consumption and processing overhead. But, as we can see, it’s not conveninent: it’s verbose, repetitive and tedious to write.
We will see below how to combine streaming with data binding to reduce the verbosity of our code.
Parsing JSON with JsonParser
and ObjectMapper
#
This example shows how to take advantage of the data binding capabilities of
ObjectMapper
while streaming the content of a file:
private void parseJson(InputStream is) throws IOException {
// Create and configure an ObjectMapper instance
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JavaTimeModule());
mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
// Create a JsonParser instance
try (JsonParser jsonParser = mapper.getFactory().createParser(is)) {
// Check the first token
if (jsonParser.nextToken() != JsonToken.START_ARRAY) {
throw new IllegalStateException("Expected content to be an array");
}
// Iterate over the tokens until the end of the array
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
// Read a contact instance using ObjectMapper and do something with it
Contact contact = mapper.readValue(jsonParser, Contact.class);
doSomethingWithContact(contact);
}
}
}
ObjectMapper
can read a value directly from
JsonParser
, so we can mix streaming with data binding, taking full advantage of the
ObjectMapper
configuration, such as modules, deserialization features and custom deserializers.
JsonGenerator
#
JsonGenerator
allows to construct JSON content based on a sequence of calls to output JSON tokens. It is the lowest level of write access to JSON content in Jackson.
Generating JSON with JsonGenerator
#
Let’s see how to generate a JSON document using
JsonGenerator
:
private void generateJson(List<Contact> contacts, OutputStream os) throws IOException {
// Create a factory which will be used for creating a JsonGenerator instance
JsonFactory jsonFactory = new JsonFactory();
// Create a JsonGenerator instance
try (JsonGenerator jsonGenerator = jsonFactory.createGenerator(os)) {
// Configure the JsonGenerator to pretty print the output
jsonGenerator.useDefaultPrettyPrinter();
// Write the start array token
jsonGenerator.writeStartArray();
// Iterate over the contacts and write each contact as a JSON object
for (Contact contact : contacts) {
writeContact(jsonGenerator, contact);
}
// Write the end array token
jsonGenerator.writeEndArray();
}
}
private void writeContact(JsonGenerator jsonGenerator, Contact contact) throws IOException {
// Write the start object token
jsonGenerator.writeStartObject();
// Write each field of the contact instance as a property/value pair
jsonGenerator.writeNumberField("id", contact.getId());
jsonGenerator.writeStringField("firstName", contact.getFirstName());
jsonGenerator.writeStringField("lastName", contact.getLastName());
jsonGenerator.writeFieldName("emails");
writeEmails(jsonGenerator, contact.getEmails());
jsonGenerator.writeStringField("createDateTime", contact.getCreatedDateTime().format(DateTimeFormatter.ISO_OFFSET_DATE_TIME));
// Write the end object token
jsonGenerator.writeEndObject();
}
private void writeEmails(JsonGenerator jsonGenerator, List<String> emails) throws IOException {
// Write the start array token
jsonGenerator.writeStartArray();
// Iterate over the emails and write each emails as a string
for (String email: emails) {
jsonGenerator.writeString(email);
}
// Write the end array token
jsonGenerator.writeEndArray();
}
Like the code for parsing the JSON document, it is efficient in terms of memory consumption and processing overhead. But it’s verbose and repetitive.
So let’s see how to combine the Jackson Streaming API with data binding for generating JSON documents.
Generating JSON with JsonGenerator
and ObjectMapper
#
To wrap up, let’s see how to generate JSON content combining
JsonGenerator
and
ObjectMapper
:
private void generateJson(List<Contact> contacts, OutputStream os) throws IOException {
// Create and configure an ObjectMapper instance
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JavaTimeModule());
mapper.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
mapper.enable(SerializationFeature.INDENT_OUTPUT);
// Create a JsonGenerator instance
try (JsonGenerator jsonGenerator = mapper.getFactory().createGenerator(os)) {
// Write the start array token
jsonGenerator.writeStartArray();
// Iterate over the contacts and write each contact as a JSON object
for (Contact contact : contacts) {
// Write a contact instance as JSON using ObjectMapper
mapper.writeValue(jsonGenerator, contact);
}
// Write the end array token
jsonGenerator.writeEndArray();
}
}
ObjectMapper
can write a value directly to
JsonGenerator
, allowing us to combine streaming with data binding to significantly reduce the amount of code we need to write. This approach takes advantage of modules, serialization features and custom serializers defined in the
ObjectMapper
.