Lockup in queue2?

Can folks confirm this bug?

  1. Load up the most recent build on a BP6
  2. connect to the terminal
  3. run the help command ?

My devices are locking up when sending data to the queue. Worse, the state of the queue makes it seem like this should not be an infinite loop…

queue.c @ line 70: if (((q->wptr + 1) & (q->element_count - 1)) != q->rptr) { /* add data and exit function */ }

Dumping the queue data structure, wptr == rptr … so this should be entering the if block. However, it does not.

GDB dump...

(gdb) bt
#0  queue_add_internal (block=true, data=0x20081e27 "6\034\037\b \335#", q=0x2003d4b8 <tx_fifo>) at /src/bp5/src/queue.c:70
#1  queue2_add_blocking (q=0x2003d4b8 <tx_fifo>, data=0x20081e27 "6\034\037\b \335#") at /src/bp5/src/queue.c:162
#2  0x1000168e in _putchar (character=<optimized out>) at /src/bp5/src/printf-4.0.0/printf.c:126
#3  _out_char (buffer=<optimized out>, idx=<optimized out>, maxlen=<optimized out>, character=<optimized out>) at /src/bp5/src/printf-4.0.0/printf.c:162
#4  _out_char (character=<optimized out>, buffer=<optimized out>, idx=<optimized out>, maxlen=<optimized out>) at /src/bp5/src/printf-4.0.0/printf.c:158
#5  0x100023dc in _vsnprintf (out=<optimized out>, out@entry=0x10001679 <_out_char>, buffer=0x20081ec8 "", buffer@entry=0x20081ea8 "A", maxlen=maxlen@entry=4294967295,
    format=0x1005f866 "s%s\r\n", format@entry=0x1005f9dc "\033[38;2;191;165;48m", va=..., va@entry=...) at /src/bp5/src/printf-4.0.0/printf.c:819
#6  0x1000285c in printf_ (format=0x1005f85c "%s%s%s\t%s%s%s\r\n") at /src/bp5/src/printf-4.0.0/printf.c:876
#7  0x10022fe8 in ui_help_options (help=help@entry=0x10093a94 <global_commands>, count=count@entry=65) at /src/bp5/src/ui/ui_help.c:35
#8  0x10007a18 in help_global () at /src/bp5/src/commands/global/h_help.c:175
#9  help_handler (res=<optimized out>) at /src/bp5/src/commands/global/h_help.c:204
#10 0x10021258 in ui_process_commands () at /src/bp5/src/ui/ui_process.c:146
#11 0x10000f52 in core0_infinite_loop () at /src/bp5/src/pirate.c:600
#12 main () at /src/bp5/src/pirate.c:676

(gdb) print -pretty on -- *q
$3 = {
  core = {
    spin_lock = 0x0
  },
  data = 0x20028800 <tx_buf> "89mo\033[0m\t\033[38;2;191;165;48mChoose how numbers are displayed\033[0m\r\n\033[38;2;150;203;89ml/L\033[0m\t\033[38;2;191;165;48mBitorder (msb/LSB)\033[0m\r\n\033[38;2;150;203;89mcls\033[0m\t\033[38;2;191;165;48mClear and reset the ter"...,
  wptr = 915,
  rptr = 915,
  element_count = 1024
}

I have a theory. It’s a good exercise to go through, and highlights the value of using libraries that other folk have put the time and energy into testing and debugging.

My first question was why the spinlock was zero (unclaimed). I found that answer rapidly. Two weeks ago, @ian disabled the spinlocks in commit b46a788, with comment:

Disable spinlock on queue, should improve UART bridge top speed.

Next, let’s take a look at the queue structure:

typedef struct {
    lock_core_t core;
    uint8_t* data;
    uint16_t wptr;
    uint16_t rptr;
    uint16_t element_count;
} queue_t;

Final thing to note is that Core0 puts data into this queue (modifying wptr), while Core1 pulls data out of this queue (modifying rptr).

Q: Can you reason what the underlying problem here is?

Hint 1

The function, queue_add_internal(...) never modifies the value of q->rptr.

Hint 2

Note the structure definition has no type qualifiers for any of the fields.

Hint 3

Where a function does not modify a memory value, what can a well-written optimizing compiler do, when that function keeps reading that same memory location multiple times?

Hint 4

Memory fences, memory barriers, and more (oh my!)

My Theory

With the removal of the spinlocks, there is no longer any construct that tells the compiler that the fields in queue_t can be modified out from under it. As a result, the compiler has optimized out the repeated read of q->rptr into a single read (likely when the function is entered). Thus, if the function is entered when the queue is full, the cached reads of those values will never be updated … resulting in an infinite loop.

Dumping the registers strongly supports this theory (but I’ve not gone deep with a full analysis).

Fix if my theory is accurate...

typedef struct {
    lock_core_t core;
    uint8_t* data;
    volatile uint16_t wptr;
    volatile uint16_t rptr;
    uint16_t element_count;
} queue_t;

For more, see the first comment on a writeup on a similar bug.

My fix appears to resolve this. All builds for the last two weeks may have hit random queue lockups, and will be fixed by this. PR incoming…

1 Like

I have not had an issue with Bus Pirate 6, I tested the latest to be certain. However, I did have an issue on 5 with optimized release build when using the queue in binmode. Perhaps it was the same thing. I accepted the pull request.

Yes, you are right … it will likely only affect optimized builds.

Spoilers for above challenge ...

The compiler is less likely to change repeated reads of underlying memory into a single read unless optimizations are enabled.

Hic Sunt Dracones

Memory models, and practical effects in multi-processor (and/or ISR) systems, are a foggy swamp full of traps (and ROUS).

When this code was previously using spinlocks, I believe the spinlocks provided the necessary memory barrier / synchronization points, thus preventing the compiler from performing that optimization. With the removal of the spinlocks, the compiler saw nothing to prevent it from doing this optimization.

The same concept applies to any variable that is dynamically changed outside a given function. Either a memory barrier or other synchronization point is needed to ensure the compiler understands that memory values may need to be re-read. This is why variables modified in an ISR also need the volatile qualifier.

2 Likes