Skip to content

Conversation

@steven-aerts
Copy link
Contributor

What is the purpose of the change

When avro-c tries to encode an empty byte array it will avro_malloc(0)
which on some architectures will return NULL. Make sure this is not
interpreted as an error or or dereferenced causing a segfault.

Verifying this change

Extended unit tests to validate this case

Documentation

  • Does this pull request introduce a new feature? no

@martin-gmartin-g changed the title Avro 4193Avro-4193: [c] segfault cannot handle empty byte arrayOct 24, 2025
@martin-g
Copy link
Member

Please rebase!

When avro-c tries to encode an empty byte array it will avro_malloc(0) which on some architectures will return NULL. Make sure this is not interpreted as an error or or dereferenced causing a segfault.
@steven-aerts
Copy link
ContributorAuthor

@martin-g sorry for that, rebased on main.

@martin-gmartin-g requested a review from CopilotOctober 27, 2025 09:25
Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a segmentation fault that occurs when avro-c attempts to encode empty byte arrays. The issue arises because avro_malloc(0) can return NULL on some architectures, which was incorrectly treated as an allocation error and caused null pointer dereferences.

Key changes:

  • Updated NULL checks to distinguish between allocation failures and valid zero-size allocations
  • Added null pointer guard in the test allocator's deallocation path
  • Extended test coverage to validate empty byte array handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

FileDescription
lang/c/src/datum.cModified NULL checks in avro_bytes() and avro_bytes_set() to only treat NULL as an error when size is non-zero
lang/c/src/value-json.cUpdated NULL checks in encode_utf8_bytes() and avro_value_to_json_t() to handle zero-length allocations correctly
lang/c/tests/test_avro_data.cAdded null pointer guard in test allocator and introduced test_empty_bytes() to verify empty byte array handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


staticinttest_empty_bytes(void)
{
charbytes[] ={};
Copy link

CopilotAIOct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty array initializer {} is a GNU extension and not standard C. Consider using char *bytes = NULL; or char bytes[1]; for better portability across compilers.

Suggested change
charbytes[]={};
char*bytes=NULL;

Copilot uses AI. Check for mistakes.
avro_datum_decref(datum);
avro_datum_decref(expected_datum);

avro_schema_decref(writer_schema);
Copy link

CopilotAIOct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double free: writer_schema is already decremented at line 230, so this second decrement at line 243 will cause a use-after-free error.

Suggested change
avro_schema_decref(writer_schema);

Copilot uses AI. Check for mistakes.
{
char*bytes_copy= (char*) avro_malloc(size);
if (!bytes_copy){
if (!bytes_copy&&size){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen now if size is 0?
It looks to me that no matter what is the value of bytes_copy it will never enter the if and it will call memcpy(bytes_copy, bytes, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the there must be a check for size == 0 as a first step in this function that returns early and avoids any mallocs and memcpys.

}

json_t*result=json_stringn_nocheck((constchar*) encoded, encoded_size);
json_t*result=json_stringn_nocheck((constchar*) encoded ? encoded : "", encoded_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json_t*result=json_stringn_nocheck((constchar*) encoded ? encoded : "", encoded_size);
json_t*result=json_stringn_nocheck((constchar*) encoded ? encoded : "", encoded ? encoded_size : 0);

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar code for FIXED at https://github.com/steven-aerts/avro/blob/89fb9db03478b793be7188c5683c6560c2eff115/lang/c/src/value-json.c#L244
Does it need the same improvements ?

{
char*bytes_copy= (char*) avro_malloc(size);
if (!bytes_copy){
if (!bytes_copy&&size){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the there must be a check for size == 0 as a first step in this function that returns early and avoids any mallocs and memcpys.

intrval;
char*bytes_copy= (char*) avro_malloc(size);
if (!bytes_copy){
if (!bytes_copy&&size){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@steven-aerts@martin-g