Hardware Issues. Doesn't boot sometimes, LEDs blink randomly

I cracked open a new purple board that arrived for the SPI sniffer. Everything was good, except that 1. I had the COM ports swap issue (higher is UI) and 2. No terminal with the shipped firmware. No response at all.

Everything is fine after loading the latest firmware.

We tested this batch four times. After PCBA, after LCD assembly, after final assembly (we paid the factory to do these), and then A Bin and a team of temps tested each before packaging. My guess is they have Windows 7,and I’m on 10, but what? I compiled and tested that firmware right here at this PC.

Scaling this fast brings out a lot of corner cases that I usually handle one at a time in much smaller batches.

Now that we have a handle on a possible solution, I’ll prepare some test firmwares. This is my highest priority.

2 Likes

it looks like a unfortunate coincidence of some hardware component working on a edge (manufacturing defect or something) and firmware boot sequence triggering this hw edge case. Because many other devices from same batch doesnt have this problem (including new purple one I just received).

1 Like

Interesting case. Are these ws2812-style LEDs? On those, there’s nothing to the cpu even notice they’re attached (indeed, I develop s/w and often don’t even plug them in) so it’s not like it’d be spinning on an ack (there isn’t one - data is w/o) and hanging.

Related, those can be power hogs. If you’re using white at full intensity,.the three lamps in each can pull about 60mA combined. Do you have enough power that.youre not creating a power deficiency when trying to start them? That might help explain the case of working better with the strip disconnected. A scope triggered with a voltage just below expected may see a dip. If that’s followed by a burst of traffic on the 2812 data line, there may be a strong hint that something like a big power supply cap isn’t capacitating. Bright blinkies can DOS the power supply.

Does the 2040 expose an internally generated clock or something that you can attach a scope to and check it the CPU is alive? Can you see an address or data pin to see if it’s fetching opcodes so that the signal. Is approximately multiple of clock frequency?

Do you perhaps have a free pin (or one that you can wiggle so quickly that a chip like 2812 won’t notice) that you can wiggle in a.way that can be observed on a scope or analyzer to SOS out telemetry like a stage of boot or a line number to see if it makes it to like 251 but not line 264 or such? A #define called with LINE is usually easy in C or assy.

I could be off in the weeds. (I’m an imposter hardware dude, but I’ve.been involved in numerous hardware bring up issues on embedded systems.) I’m just shouting out things to see if anything rings bells.

1 Like

You are correct, WS2812 LEDs. The default start up current is <10% max, which with all those LEDs is still just borderline acceptable when white.

There are no spare pins, but the button could be used. I guess the question is then if someone with a board with the issue also has a scope and the time to test. There’s at least one of these boards on the way to me, and I plan to single step debug it to try to reproduce the issue.

I’ll post an archive of test firmwares shortly, finishing it up now.

@Ian if you tell me what exactly you would like to probe on pcb with scope, I can probably find time to do that :slight_smile:

1 Like

bus_pirate5_rev10-bp_rev-tests.zip (730.3 KB)

Four tests with different systems disabled. It’s done with defines, and I pushed it to a powerup_test branch in git.

I have a hunch it’s the LED power up current, but then usually USB ports report when they cut off due to over current? However, how that interacts with the board with broken LEDs is a bit of a mystery. Perhaps the good leds were randomly at 100% = 120ma causing issues?

Any test results would be greatly appreciated.

This is an attempt to recreate this firmware which has been reported to work consistently.

bus_pirate-posts.zip (247.8 KB)

post 1 is 6c3fbfebd876cc49de220aa1464d1166e43a900a
post2 is 05e7a9f265ba1c19d614f03ad8b6c6ed0b3b1168

OK these are the results:
bus_pirate5_rev10-bp_rev.uf2 - boots every time, all leds flash rainbow on start, then according to settings
bus_pirate5_rev10-psuinit.uf2 - boots every time, all leds flash rainbow on start, then according to settings
bus_pirate5_rev10-rgb.uf2 - boots every time, NO leds flash on start
bus_pirate5_rev10-storage.uf2 - boots every time, all leds flash rainbow on start, then factory led animation
bus_pirate-post1.uf2 - boots every time, all leds flash rainbow on start, then according to settings
bus_pirate-post2.uf2 - boots every time, all leds flash rainbow on start, then according to settings
ci-buspirate5-main-b782332 - one of the recent fw - doesn’t boot.

3 Likes

Woah, ok, that’s unexpected.

bus_pirate5_rev10-dio.zip (181.8 KB)

Here’s the latest build ci-buspirate5-main-dd599b3.zip, except compiled on my PC instead of the build server.

Excellent points, as expected. I guess in this crowd where everyone seems to have be comfortable with SMT rework, I figured scopes (and related skills to set triggers, etc.) would be common, but you’re right that you need one of the affected people to have those, so it’s surely an minority of a minority. (Thanks, @alexhude !)

With this info, there are some games you can play to test the startup power crowbar theory. On successive “frames”, you can toggle even/odds (or thirds or fourths…) so that all thirty (?) LEDs aren’t powered at once. You’d still have DAT going wiggly wiggly, so if the theory is that DAT is shorted to “somewhere important” the (admittedly lower duty cycle) pin might still be dumping a signal into the bad place but fewer LEDs would be active at once so there would be less current. Humans may perceived a reduced brightness, but with a suitable refresh, probably not a flicker.

Assuming you’re lighting 8/255th’s of RGB, one so staggered sequence would be:

F0: 8(GRB) ---- 8(GRB) --- 8(GRB) --- ...
F1: --- 8(GRB) --- 8(GRB) --- 8(GRB)  ...

Similarly, three LEDs per bulb is more than one LED per bulb. If you turn on just one of {RGB}, the current pulled will be lower. I don’t know if it’s reflected in the WS2812, but different colors of LEDs take different amount of current at the same voltage. ISTR that blue is the highest and red is the lowest. (Trivia: Espressif certifies designs with power indicators on GPIOs on the ESP32 in otherwise identical packages that are red, but not blue.)

Instead of white try turning on just red. The actual shape of the signal on DAT will be altered to have fewer ones per pixel, but it’ll still be at the same overall frequency. This is probably a single-byte source change.

Repeated Frames: 8(-R-)  8(-R-)  8(-R-) ...

(Something non-obvious in the markdown is making the colors show. That’s not intentional. Read this on a VT100; not a VT241 :slight_smile: )

Either of these attempts will alter the waveform being dumped onto DAT so depending on where that pin is being (possibly) accidentally routed, we may be changing the reaction of the thing we’re disturbing beyond just changing brightness/color/interleaving.

One thing that Alex (or others) may try under your tutelage is to either remove power (and/or DAT) from the LED strip, reducing current to zero but still emitting wigglies on DAT or externally clipping a more independent 5V supply to the LED strip, but I’ll leave that to the real EE types to decide how they feel about having potentially two independent 5V sources in the same system, hopefully in the same ground reference.

If the injected 5V makes you nervous, an alternate approach that we do All The Time in strip lighting is to power the strip from a different source than the CPU completely. Lift the +5V pin from the WS2812 strip and provide power to the strip ONLY from some other 5V source. Then if the 2812’s are hungry, they’re fed power from a different trough.

A large (by digital scale) cap on the power rails of the 2812’s might be an interesting experiment if we’re testing the power being a DOS attack on the SOC’s power. The key question that I think we have is whether we’re sagging the SOC’s power on startup or whether that DAT pin is “leaking” to someplace unintended, right?

I suppose one other variable might be the PIO code that’s presumably bit-banging the DAT line. I don’t know RP2040 well enough to debug that mentally.

P.S. It may be increasingly obvious why I want to implement a WS2812 reader logic analyzer mode for buspirate once I get one. :slight_smile:

@ian

bus_pirate5_rev10-dio.zip - boots every time, all leds flash rainbow on start, then according to settings
ci-buspirate5-main-dd599b3.zip - doesn’t boot

hmmm, @Ian I was playing with DIO on dd599b3 and noticed something odd. Somehow it fixes boot if you power pins.

  1. enter to boot loader with button
  2. flash dd599b3 (it boots after soft reset)
  3. unplug device, plug again - doesn’t boot
  4. enter boot loader again
  5. flash dd599b3 (it boots after soft reset)
  6. enter DIO (m, 8)
  7. enable pin0 (A 0)
  8. add power (W)
  9. unplug, plug again - it boots :exploding_head:

Thank you so much for testing. This is just plain odd. That is the exact same commit compiled on my PC vs the linux build server. How do we get something that is both a corner case and build environment dependent…

It seems like I may need to push all the tests through the build server to see whats really happening. I’m going to hop on the server now and see whats going on.

Heh, for my long life dealing with embedded systems I am not surprised at all :smiley:
I remember case when gcc for micro blaze on build server silently broke 64bit integer constants and until I REed binary we had no idea what is going on.

1 Like

Yesterday I setup several variants of the build server, I’m going to connect it to the forum and push through some tests.Then I’ll diff the resulting firmware files to try to get a handle on how different they are.

Woah, I was not ready for that :slight_smile: In the middle of another build server, suspect this one will do the trick.

I hope that is turns out to be gcc-arm version issues. The RP2040 sdk and installer still use 10.3-2021.10, but I have 13.1 installed on the build server.

bus_pirate5_rev10-arm-gcc-10.3.zip (192.1 KB)

Fingers crossed, this may fix it!

Will take a bit to get it all up and running, and I may take this chance to rework my build script to insert the commit hash.

This firmware above doesn’t boot for me.

Thank you so much for testing. That’s disappointing.

https://forum.buspirate.com/uploads/short-url/umy6sac72dkd6eUL19VWSj83dgf.zip

This is a server built firmware with the RGB LEDs disabled, just to probe that a bit more.


bus_pirate5_rev10-10.3-debug.zip (192.1 KB)

Server build, but done in debug config.