Unrealiable SPI flash operation

nuschpl · October 9, 2024, 6:27pm

Hello
I’m trying to program XMC XM25QU256B 32Mbytes SPI flash chip but unable to write or read it realiably. After writing and reading back I’m getting different content that the one used to program.
What’s interesting when reading the wrong content back, then going through full cycle - erase, write, read back, copy to PC I’m getting exactly the same file - the errors introduced are reproducable.

What I’ve checked:

it happens on two different chips(same model)
I’ve also attempted to read back source file from BP to exclude that it was corrupted before flashing from it
it happens regardless of the programing speed/ voltage - suggesting some logical issue
Some of errors are very specific - usually the very first few lines in hexeditor are corrupted completely but also when you divide my flash into halves it will result in such addressing:
Part1: 0x00000000-0x00FFFFFF
Part2: 0x01000000-0x01FFFFFF
what happens is the trailer of Part1 in the original consist of only 0xFF while after flashing it contains mirror of the end of Part2 - I’m sure about that because there are very specific ASCII strings. So it looks that the MSB 0x01 from address is somehow lost .I’m wondering if this could be related to 3 bytes bytes addressing mode instead of 4. I didn’t selected it but know the chip supports
Regards

electronic_eel · October 9, 2024, 7:23pm

How did you connect the flash IC? Is it in a programming socket or did you use grabbers or individual wires? How long are these wires? Does it help to use a different method to wire it up? Does changing connection speed help?

I’m asking because issues like this could be caused by EMI on long wires and similar. That this happens several times at the same position doesn’t rule this out, a special byte-pattern that happens to occur in your data could create this.

nuschpl · October 9, 2024, 7:57pm

So currently it’s a graber with standalone chip, previously it was some strange setup with chip in circuit and pogopins stabilized with zippers over whole computer. But i really thing we shouldn’t go that path as the setup will change with each use for every user and the software should either work or throw an error. I think ideally when writing it should erase whole chip, write then verify block by block, when error erase only that specific block and go forward. When still error decrease block size, again write only those invalid and verify again.

Also as said the 32Mb file written twice has errors in exactly same places despite different write process - speed, 0.2V under/overvoltage from 3.3(within specs)).

Honestly I’m not sure if I should add some pullups/pulldowns somewhere?
Also it the clock timing of different width something expected ?:

nuschpl · October 9, 2024, 9:30pm

I’ve switched the speed to 1kHZ and found interesting glitch:

The Chip Select goes down for 1/100 of clock period (for 2.5us when clock is high arround 250us). This makes confusion to Saleae but it’s inline with the specs (see Clock Polarity here SPI Analyzer - User Guide | Saleae Support) .
So whatever the clock polarity was set in BusPirate SPI mode settings now it will change to the value of clock when nCS bounced back.
The consequence is per my understanding that the chip won’t settles back to receiving state unless the CLK won’t get to its new High Idle state(per settings it was Low) . I’m not sure about clock phase because there is some confusion between Saleae and Buspirate terminology but most likely the glitch will cause that MSB will be skipped from the data over MOSI. So it’s very likely that when the glitch will happen during sending address the MSB+1 will be flipped from desired value resulting in flipping flash banks(upper 16MB, lower 16Mb, Part1/2 from initial post) .
Now I’m wondering about the reason. The glitch happens right after rising edge of clock corresponding to fetching LSBit of command over MOSI. So when chip starts processing the command or when BP start preparing to next cycle. It could be that current consumption rises or some invalid logic in BP code in interrupt/other tasks handling (not sure about LEDs ). If anyone interested, Saleae trace attached
trace.sal.REMOVE_FROM_HERE.zip (29.9 KB)strong text

ian · October 9, 2024, 10:10pm

I will look in to this. I’m doing fala verification tomorrow.

What board version is this?

nuschpl · October 10, 2024, 2:47am

The board is BP5rev10

ian · October 10, 2024, 9:58am

I think this is similar to what you observed? The CS raises before the clock is finished and the next byte is invalid.

I first noticed this on the RP2350 Bus Pirate while working on the follow along logic analyzer. I assumed RP2350 had a different bit to poll for SPI-finished
Someone else reported/mentioned this in a comment on the PICO SDK on github, so we’re not alone
I didn’t think it also impacted RP2040, but it seems to
It could be code regression or a bug we just never found, fully willing to own that
I suspect there is something either changed slightly or broken in the 2.0 SDK that reports the SPI transaction is finished, when it really is not
However, that same bug doesn’t seem to effect the internal SPI bus, but I don’t know that for a fact, just “it still works”. And for that matter, here the flash command reads/writes the SFDP tables correctly even though Saleae can’t decode it.

Fortunately (for me) gracefully tracking when translations end is also what I need to do for the FALA bring-up, so it fits nicely on today’s todo list. I’ll fix this bit and then do some read/write testing.

ian · October 10, 2024, 11:33am

I believe this will fix it. A new firmware should arrive any moment.

I can’t confirm without digging, but I suspect there was a change to spi_is_readable with the SDK update. There some huge changes to the SDK spi read/write functions that make every effort to use the FIFO. I believe the polling spi_is_readable now returns true when the last bit is sampled, rather than when the last bit is complete (clock falls).

Whatever the cause, polling spi_is_busy until the peripheral goes idle gets our CS changes in the correct place.

nuschpl · October 20, 2024, 9:39pm

Can you point here the commit of above, just curious? Also could this be related https://forums.raspberrypi.com/viewtopic.php?p=1808377#p1808377 ?

nuschpl · November 3, 2024, 3:48pm

Can you at least confirm if it did happen ?

ian · November 4, 2024, 1:37pm

The latest build has all the SPI fixes. So far I have not had any problems with SPI since I changed where we poll for the transmission complete bit.