Drastic storage chip price increase, potential NAND flash supply issues

We’re running a volume production of Bus Pirate 6.

The Winbond flash that holds the Bus Pirate firmware has tippled in price.

The NAND flash chip that holds the internal storage (USB disk) was already a bit expensive, and has doubled in price. More alarmingly, we bought the last reel anyone has. Micron lists it as in production, so I assume their fabs are full making other storage chips for the AI revolution.

This doesn’t mean much for price of Bus Pirate 6 as there’s enough margin to absorb the cost increase. Bus Pirate 5 has always had razor thin margins, so the price will probably have to increase on the next batch.

If the NAND chip remains unavailable we have a couple potential solutions:

  • We use the 1gbit chip. If 2gbit remains available we can swap that in.
  • We can look at alternate parts like the Winbond NAND @henrygab posted about a while ago. I seriously doubt it is a drop in replacement though, and I’m not fully confident in my ability to port the wear leveling library to another NAND.
3 Likes

Looks like there is an espresif library that interfaces multiple NAND chips:

  • Winbond - W25N01GVxxxG/T/R, W25N512GVxIG/IT, W25N512GWxxR/T, W25N01JWxxxG/T, W25N01JWxxxG/T, W25N02KVxxIR/U, W25N04KVxxIR/U
  • Gigadevice - GD5F1GQ5UExxG, GD5F1GQ5RExxG, GD5F2GQ5UExxG, GD5F2GQ5RExxG, GD5F2GM7xExxG, GD5F4GQ6UExxG, GD5F4GQ6RExxG, GD5F4GQ6UExxG, GD5F4GQ6RExxG, GD5F4GM8xExxG
  • Alliance - AS5F31G04SND-08LIN, AS5F32G04SND-08LIN, AS5F12G04SND-10LIN, AS5F34G04SND-08LIN, AS5F14G04SND-10LIN, AS5F38G04SND-08LIN, AS5F18G04SND-10LIN
  • Micron - MT29F4G01ABAFDWB, MT29F1G01ABAFDSF-AAT:F, MT29F2G01ABAGDWB-IT:G
  • Zetta - ZD35Q1GC
  • XTX - XT26G08D

This is probably a good direction to head to future proof against any component shortages.

4 Likes
[I][example] SPI NAND Flash Example for Raspberry Pi Pico
[I][example] ==============================================
[I][example] SPI initialized at 8928571 Hz
Read 2 bytes: 2C 24
[D][nand_flash] detect_chip: manufacturer_id: 2c 24

Read 2 bytes: 00 00
[D][nand_flash] detect_chip: manufacturer_id: 0 0

[E][nand_flash] Unknown manufacturer ID: 0x00
[E][nand_flash] Failed to detect nand chip
[E][example] Failed to initialize NAND flash: 264
[E][example] Failed to initialize NAND flash

I’ve got the NAND driver as a stand alone example compiling for RP2040.

Very strange bug: reading the ID the first time is correct, but all further reads are 0x00.

Update

[I][example] SPI NAND Flash Example for Raspberry Pi Pico
[I][example] ==============================================
[I][example] SPI initialized at 8928571 Hz
[I][example] NAND Flash ID: 2C 24
[I][example] NAND Flash ID: 2C 24
[I][example] NAND Flash ID: 2C 24
[I][example] NAND Flash ID: 2C 24
[I][example] NAND Flash ID: 2C 24
[D][nand_flash] detect_chip: manufacturer_id: 2c 0

[D][nand_micron] spi_nand_micron_init: device_id: 24

[I][example] NAND flash initialized successfully
[I][example] Flash Info:
[I][example]   Number of blocks: 2048
[I][example]   Block size: 131072 bytes
[I][example]   Sector size: 2048 bytes
[I][example]   Total capacity: 117760 sectors
[I][example]   Total size: 235520 KB
[I][example] Example complete!

Ok, got it!

void read_id(){
    // Function to read and print the NAND flash ID
    gpio_put(PIN_CS, 0); // Select the chip
    uint8_t cmd = 0x9F; // Read ID command
    spi_write_blocking(SPI_PORT, &cmd, 1);
    uint8_t id[3] = {0};
    spi_read_blocking(SPI_PORT, 0xff, id, 3);
    gpio_put(PIN_CS, 1); // Deselect the chip
    LOG_I(TAG, "NAND Flash ID: %02X %02X", id[1], id[2]);
}

Did the simplest possible read ID. First works, all subsequent reads fail. So, after scratching my head for a while I changed the read dummy byte from 0x00 to 0xff and it works!

This is explicitly contrary to the datasheet which shows a 0x00 dummy byte.

Think I have a handle on how it is implemented now, and how to glue it to our existing FatFS implementation.

1 Like

nand_flash_pico.zip (47.2 KB)

Here’s the whole project if anyone wants to have a look. I’ll put it in a repo eventually.

Got the new nand driver integrated into the Bus Pirate firmware, see this branch for the messy work in progress.

Really weird but somewhat expected situation:

  • New driver cannot read from a chip formatted with the old firmware
  • Old driver CAN read from a chip formatted with the new firmware
  • New driver CAN read from a chip formatted with new firmware and written/updated by the old firmware

I swapped the pinned dhara commit used by the ESP NAND driver with our old dhara version. Not much difference, but a few changes. The new driver still can’t detect a disk formatted by the old firmware.

Seems like a corner case, but the worst possible corner :slight_smile:

Will keep investigating. If anyone looks over the new branch and has suggestions please let me know.

ETA: With the new driver we have a ton more NAND options, including 1GB capacity chips. They tend to have 4K page sizes compared to 2K for the current chip, but Bus Pirate 6 has the extra ram we’d need to do a 1GB version. That’s a lot of data to push through the slow USB MSD connection though.

I mean … if you’re going to break compatibility of the internal nand, may as well ditch FAT. FAT’s only positive is that it’s a dirt-simple structure. When layered over dhara, dumping the internal nand shows it’s not nearly as easy to parse manually.

There are file systems actually defined for flash. Might be worth considering, is all…

2 Likes

I think the main reason to use FAT is that all common OSes can access it over the USB MSD without any kind of special drivers or software.

Kind of related - I made a kind of half-baked attempt to add a flatbuffer interface for the internal storage. Look for that after I get the new driver working properly.

1 Like
First checkpoint block: 0
Last checkpoint block: 526
Last checkpoint group page: 33696
Journal root page: 33710
Restored tail: 19946, bb_current: 0, bb_last: 16
Journal head page: 33712

Dhara info for old driver. I assume this is working, FatFS is happy and the drive mounts.

First checkpoint block at 0
Last checkpoint block at 526
Last checkpoint group page 33664
Journal root page 33678
Restored tail=19916, bb_current=0, bb_last=16
Journal head page 33665

Here is what the new driver pulls from the NAND. Several different values, some the same. I’m going to guess this is the issue.

  • We are using the exact same dhara code/version from the old implementation, It should not be an issue in dhara itself.
  • Possible difference of garbage collection value I’m not understanding
  • Possible issue of math/number representation. Old diver had fixed page size/chip size defines. New driver determines the parameters automatically and stores them in a variable. I have a hunch this is where we’re getting the almost but not quite the same values.

Next I’ll debug how the code is getting the values precisely and ensure that the calculations make sense vis a vis the actual data on the chip.

Added some debug output to the init process to see what’s going on.

Old driver init process
First checkpoint block: 0
Last checkpoint block: 526
Finding last group in block 526 with 4 groups
log2_ppc=4, log2_ppb=6
is free, page=33680, used_marker=ffffffff,
is free, page=33696, used_marker=ffffffff,
is free, page=33696, used_marker=ffffffff,
is free, page=33712, used_marker=ffffffff,
is free, page=33713, used_marker=ffffffff,
is free, page=33714, used_marker=ffffffff,
is free, page=33715, used_marker=ffffffff,
is free, page=33716, used_marker=ffffffff,
is free, page=33717, used_marker=ffffffff,
is free, page=33718, used_marker=ffffffff,
is free, page=33719, used_marker=ffffffff,
is free, page=33720, used_marker=ffffffff,
is free, page=33721, used_marker=ffffffff,
is free, page=33722, used_marker=ffffffff,
is free, page=33723, used_marker=ffffffff,
is free, page=33724, used_marker=ffffffff,
is free, page=33725, used_marker=ffffffff,
is free, page=33726, used_marker=ffffffff,
is free, page=33727, used_marker=ffffffff,
Last checkpoint group page: 33696
Journal root page: 33710
Restored tail: 19946, bb_current: 0, bb_last: 16
is free, page=33697, used_marker=ffffffff,
is free, page=33698, used_marker=ffffffff,
is free, page=33699, used_marker=ffffffff,
is free, page=33700, used_marker=ffffffff,
is free, page=33701, used_marker=ffffffff,
is free, page=33702, used_marker=ffffffff,
is free, page=33703, used_marker=ffffffff,
is free, page=33704, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33701, used_marker=ffffffff,
is free, page=33702, used_marker=ffffffff,
is free, page=33703, used_marker=ffffffff,
is free, page=33704, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33702, used_marker=ffffffff,
is free, page=33703, used_marker=ffffffff,
is free, page=33704, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33703, used_marker=ffffffff,
is free, page=33704, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33704, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33705, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33706, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33707, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33708, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33709, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33710, used_marker=ffffffff,
is free, page=33711, used_marker=ffffffff,
is free, page=33712, used_marker=ffffffff,
is free, page=33713, used_marker=ffffffff,
is free, page=33714, used_marker=ffffffff,
is free, page=33715, used_marker=ffffffff,
is free, page=33716, used_marker=ffffffff,
is free, page=33717, used_marker=ffffffff,
is free, page=33718, used_marker=ffffffff,
is free, page=33719, used_marker=ffffffff,
is free, page=33720, used_marker=ffffffff,
is free, page=33721, used_marker=ffffffff,
is free, page=33722, used_marker=ffffffff,
is free, page=33723, used_marker=ffffffff,
is free, page=33724, used_marker=ffffffff,
is free, page=33725, used_marker=ffffffff,
is free, page=33726, used_marker=ffffffff,
is free, page=33727, used_marker=ffffffff,
Journal head page: 33712
First checkpoint block: 0
Last checkpoint block: 526
Finding last group in block 526 with 4 groups
log2_ppc=4, log2_ppb=6
is free, page=33680, used_marker=ffffffff,
is free, page=33696, used_marker=ffffffff,
is free, page=33696, used_marker=ffffffff,
...
Last checkpoint group page: 33696
Journal root page: 33710
Restored tail: 19946, bb_current: 0, bb_last: 16
is free, page=33697, used_marker=ffffffff,
is free, page=33698, used_marker=ffffffff,
is free, page=33699, used_marker=ffffffff,
is free, page=33700, used_marker=ffffffff,
...
is free, page=33701, used_marker=ffffffff,
is free, page=33702, used_marker=ffffffff,
...
is free, page=33702, used_marker=ffffffff,
is free, page=33703, used_marker=ffffffff,
Journal head page: 33712

This is a lot to break down, but the scanning of pages is out of order and often redundant.

New driver init
HiZ> ls
diskio_initialize called with drv=0
Initializing SPI NAND flash on SPI port 4003C000 with CS pin 26
Starting NAND chip detection...
[D][nand_flash] detect_chip: manufacturer_id: 2c

                                                Detected SPI NAND Manufacturer ID: 0x2C
       NAND chip detected and unprotected successfully.
[D][spi_nand] is_bad, block=0, page=0,indicator = ffff
First checkpoint block at 0
[D][spi_nand] is_bad, block=511, page=32704,indicator = ffff
[D][spi_nand] is_bad, block=512, page=32768,indicator = ffff
[D][spi_nand] is_bad, block=767, page=49088,indicator = ffff
[D][spi_nand] is_bad, block=768, page=49152,indicator = ffff
[D][spi_nand] is_bad, block=769, page=49216,indicator = ffff
[D][spi_nand] is_bad, block=770, page=49280,indicator = ffff
[D][spi_nand] is_bad, block=771, page=49344,indicator = ffff
[D][spi_nand] is_bad, block=772, page=49408,indicator = ffff
[D][spi_nand] is_bad, block=773, page=49472,indicator = ffff
[D][spi_nand] is_bad, block=774, page=49536,indicator = ffff
[D][spi_nand] is_bad, block=639, page=40896,indicator = ffff
[D][spi_nand] is_bad, block=640, page=40960,indicator = ffff
[D][spi_nand] is_bad, block=641, page=41024,indicator = ffff
[D][spi_nand] is_bad, block=642, page=41088,indicator = ffff
[D][spi_nand] is_bad, block=643, page=41152,indicator = ffff
[D][spi_nand] is_bad, block=644, page=41216,indicator = ffff
[D][spi_nand] is_bad, block=645, page=41280,indicator = ffff
[D][spi_nand] is_bad, block=646, page=41344,indicator = ffff
[D][spi_nand] is_bad, block=575, page=36800,indicator = ffff
[D][spi_nand] is_bad, block=576, page=36864,indicator = ffff
[D][spi_nand] is_bad, block=577, page=36928,indicator = ffff
[D][spi_nand] is_bad, block=578, page=36992,indicator = ffff
[D][spi_nand] is_bad, block=579, page=37056,indicator = ffff
[D][spi_nand] is_bad, block=580, page=37120,indicator = ffff
[D][spi_nand] is_bad, block=581, page=37184,indicator = ffff
[D][spi_nand] is_bad, block=582, page=37248,indicator = ffff
[D][spi_nand] is_bad, block=543, page=34752,indicator = ffff
[D][spi_nand] is_bad, block=544, page=34816,indicator = ffff
[D][spi_nand] is_bad, block=545, page=34880,indicator = ffff
[D][spi_nand] is_bad, block=546, page=34944,indicator = ffff
[D][spi_nand] is_bad, block=547, page=35008,indicator = ffff
[D][spi_nand] is_bad, block=548, page=35072,indicator = ffff
[D][spi_nand] is_bad, block=549, page=35136,indicator = ffff
[D][spi_nand] is_bad, block=550, page=35200,indicator = ffff
[D][spi_nand] is_bad, block=527, page=33728,indicator = ffff
[D][spi_nand] is_bad, block=528, page=33792,indicator = ffff
[D][spi_nand] is_bad, block=529, page=33856,indicator = ffff
[D][spi_nand] is_bad, block=530, page=33920,indicator = ffff
[D][spi_nand] is_bad, block=531, page=33984,indicator = ffff
[D][spi_nand] is_bad, block=532, page=34048,indicator = ffff
[D][spi_nand] is_bad, block=533, page=34112,indicator = ffff
[D][spi_nand] is_bad, block=534, page=34176,indicator = ffff
[D][spi_nand] is_bad, block=519, page=33216,indicator = ffff
[D][spi_nand] is_bad, block=520, page=33280,indicator = ffff
[D][spi_nand] is_bad, block=523, page=33472,indicator = ffff
[D][spi_nand] is_bad, block=524, page=33536,indicator = ffff
[D][spi_nand] is_bad, block=525, page=33600,indicator = ffff
[D][spi_nand] is_bad, block=526, page=33664,indicator = ffff
[D][spi_nand] is_bad, block=526, page=33664,indicator = ffff
[D][spi_nand] is_bad, block=527, page=33728,indicator = ffff
[D][spi_nand] is_bad, block=528, page=33792,indicator = ffff
[D][spi_nand] is_bad, block=529, page=33856,indicator = ffff
[D][spi_nand] is_bad, block=530, page=33920,indicator = ffff
[D][spi_nand] is_bad, block=531, page=33984,indicator = ffff
[D][spi_nand] is_bad, block=532, page=34048,indicator = ffff
[D][spi_nand] is_bad, block=533, page=34112,indicator = ffff
[D][spi_nand] is_bad, block=534, page=34176,indicator = ffff
Last checkpoint block at 526
Finding last group in block 526 with 4 groups
log2_ppc=4, log2_ppb=6
[D][spi_nand] is free, page=33680, used_marker=ffff,
[D][spi_nand] is free, page=33681, used_marker=ffff,
[D][spi_nand] is free, page=33682, used_marker=ffff,
[D][spi_nand] is free, page=33683, used_marker=ffff,
[D][spi_nand] is free, page=33684, used_marker=ffff,
[D][spi_nand] is free, page=33685, used_marker=ffff,
[D][spi_nand] is free, page=33686, used_marker=ffff,
[D][spi_nand] is free, page=33687, used_marker=ffff,
[D][spi_nand] is free, page=33688, used_marker=ffff,
[D][spi_nand] is free, page=33689, used_marker=ffff,
[D][spi_nand] is free, page=33690, used_marker=ffff,
[D][spi_nand] is free, page=33691, used_marker=ffff,
[D][spi_nand] is free, page=33692, used_marker=ffff,
[D][spi_nand] is free, page=33693, used_marker=ffff,
[D][spi_nand] is free, page=33694, used_marker=ffff,
[D][spi_nand] is free, page=33695, used_marker=ffff,
[D][spi_nand] is free, page=33664, used_marker=ffff,
[D][spi_nand] is free, page=33665, used_marker=ffff,
[D][spi_nand] is free, page=33666, used_marker=ffff,
[D][spi_nand] is free, page=33667, used_marker=ffff,
[D][spi_nand] is free, page=33668, used_marker=ffff,
[D][spi_nand] is free, page=33669, used_marker=ffff,
[D][spi_nand] is free, page=33670, used_marker=ffff,
[D][spi_nand] is free, page=33671, used_marker=ffff,
[D][spi_nand] is free, page=33672, used_marker=ffff,
[D][spi_nand] is free, page=33673, used_marker=ffff,
[D][spi_nand] is free, page=33674, used_marker=ffff,
[D][spi_nand] is free, page=33675, used_marker=ffff,
[D][spi_nand] is free, page=33676, used_marker=ffff,
[D][spi_nand] is free, page=33677, used_marker=ffff,
[D][spi_nand] is free, page=33678, used_marker=ffff,
[D][spi_nand] is free, page=33679, used_marker=ffff,
Last checkpoint group page 33664
Journal root page 33678
Restored tail=19916, bb_current=0, bb_last=16
[D][spi_nand] is free, page=33665, used_marker=ffff,
[D][spi_nand] is free, page=33666, used_marker=ffff,
[D][spi_nand] is free, page=33667, used_marker=ffff,
[D][spi_nand] is free, page=33668, used_marker=ffff,
[D][spi_nand] is free, page=33669, used_marker=ffff,
[D][spi_nand] is free, page=33670, used_marker=ffff,
[D][spi_nand] is free, page=33671, used_marker=ffff,
[D][spi_nand] is free, page=33672, used_marker=ffff,
[D][spi_nand] is free, page=33673, used_marker=ffff,
[D][spi_nand] is free, page=33674, used_marker=ffff,
[D][spi_nand] is free, page=33675, used_marker=ffff,
[D][spi_nand] is free, page=33676, used_marker=ffff,
[D][spi_nand] is free, page=33677, used_marker=ffff,
[D][spi_nand] is free, page=33678, used_marker=ffff,
[D][spi_nand] is free, page=33679, used_marker=ffff,
[D][spi_nand] is free, page=33680, used_marker=ffff,
Journal head page 33665
Error code from dhara_map_resume: 0, 0
spi_nand_flash_init_device returned 0
Error code: 13

First checkpoint block at 0
Last checkpoint block at 526
Finding last group in block 526 with 4 groups
log2_ppc=4, log2_ppb=6
[D][spi_nand] is free, page=33680, used_marker=ffff,
[D][spi_nand] is free, page=33681, used_marker=ffff,
[D][spi_nand] is free, page=33682, used_marker=ffff,
[D][spi_nand] is free, page=33683, used_marker=ffff,
[D][spi_nand] is free, page=33684, used_marker=ffff,
[D][spi_nand] is free, page=33685, used_marker=ffff,
[D][spi_nand] is free, page=33686, used_marker=ffff,
[D][spi_nand] is free, page=33687, used_marker=ffff,
[D][spi_nand] is free, page=33688, used_marker=ffff,
[D][spi_nand] is free, page=33689, used_marker=ffff,
[D][spi_nand] is free, page=33690, used_marker=ffff,
...
[D][spi_nand] is free, page=33664, used_marker=ffff,
[D][spi_nand] is free, page=33665, used_marker=ffff,
[D][spi_nand] is free, page=33666, used_marker=ffff,
[D][spi_nand] is free, page=33667, used_marker=ffff,
[D][spi_nand] is free, page=33668, used_marker=ffff,
[D][spi_nand] is free, page=33669, used_marker=ffff,
...
Last checkpoint group page 33664
Journal root page 33678
Restored tail=19916, bb_current=0, bb_last=16
[D][spi_nand] is free, page=33665, used_marker=ffff,
[D][spi_nand] is free, page=33666, used_marker=ffff,
[D][spi_nand] is free, page=33667, used_marker=ffff,
...
[D][spi_nand] is free, page=33676, used_marker=ffff,
[D][spi_nand] is free, page=33677, used_marker=ffff,
[D][spi_nand] is free, page=33678, used_marker=ffff,
[D][spi_nand] is free, page=33679, used_marker=ffff,
[D][spi_nand] is free, page=33680, used_marker=ffff,
Journal head page 33665

The new driver seems to be a lot more sequential in scanning the pages. I note it does not scan 33696 at all.

int spi_nand_page_is_free(row_address_t row, bool* is_free) {
    // page read will validate block & page address
    size_t page_and_oob_len = SPI_NAND_PAGE_SIZE + SPI_NAND_OOB_SIZE();

    int ret = spi_nand_page_read(row, 0, page_main_and_largest_oob_buffer, page_and_oob_len);
    if (SPI_NAND_RET_OK != ret) {
        return ret;
    }

    *is_free = true; // innocent until proven guilty
    // iterate through page & oob to make sure its 0xff's all the way down

    // TODO: static_assert( sizeof(page_main_and_oob_buffer) % sizeof(uint32_t) == 0, "page_main_and_oob_buffer size
    // must be a multiple of 4" );
    uint32_t comp_word = 0xffffffff;
    for (size_t i = 0; i < page_and_oob_len; i += sizeof(comp_word)) {
        if (0 != memcmp(&comp_word, &page_main_and_largest_oob_buffer[i], sizeof(comp_word))) {
            *is_free = false;
            break;
        }
    }
    printf("is free, page=%d, used_marker=%04x,\r\n", row, comp_word);

    return SPI_NAND_RET_OK;
}

So where is the is_free scan happening. Here’s the old driver. It is in a function provided by the NAND layer, not dhara. From what I can tell, it is loading the full page + ECC, then scanning the whole thing in 4 byte increments to make sure it’s blank?

esp_err_t nand_is_free(spi_nand_flash_device_t *handle, uint32_t page, bool *is_free_status)
{
    esp_err_t ret = ESP_OK;
    uint16_t used_marker;

    GOTO_ON_ERROR(read_page_and_wait(handle, page, NULL), fail);

    uint32_t block = page >> handle->chip.log2_ppb;
    uint16_t column_addr = get_column_address(handle, block, handle->chip.page_size + 2);

    GOTO_ON_ERROR(spi_nand_read(handle, (uint8_t *)handle->read_buffer, column_addr, 2), fail);

    memcpy(&used_marker, handle->read_buffer, sizeof(used_marker));
    NAND_LOGD(TAG, "is free, page=%"PRIu32", used_marker=%04x,", page, used_marker);
    *is_free_status = (used_marker == 0xFFFF);
    return ret;
fail:
    NAND_LOGE(TAG, "Error in nand_is_free %d", ret);
    return ret;
}

The new driver uses a drastically different is_free search (I think?). It grabs the first 2 bytes from a page and then checks if it is 0xffff.

I guess the question now is who is right? Will need to read how dhara journal works.

Seems the new driver has several problems, the first two to address:

  • Non-standard bad block marking/checking
  • Incorrect determination of blank pages
First checkpoint block at 0
Last checkpoint block at 526
Last checkpoint group page 33696
Journal root page 33710
Restored tail=19946, bb_current=0, bb_last=16
Journal head page 33712
FatFS Error code: 3

Yeah! now we find the same locations as the old driver. Before we got fatfs error 13 (no file system), now we get 3 (not ready).

Maybe if we fix the other bugs in the new driver we can get it going.

TLDR: seems to be working now. I did initially have some issues with storage not being identified/inability to save. I can’t reproduce it again though. Maybe 100% backwards compatibility isn’t there yet. Much more testing needed, which you can help with using this firmware:

bus_pirate5_rev10-new-nand.zip (394.3 KB)

Be sure to backup the internal storage, it is very likely opportunities to corrupt stuff exist.

I will do full on stress testing and verification in the coming days.

AI slop summary

Summary of Bugs Found in ESP-IDF SPI NAND Flash Driver

Critical Bugs:

1. nand_copy() - Double program_execute corruption (nand_impl.c, lines ~327-383)

  • Issue: When copying between different planes, program_execute is called twice on the destination page, causing data corruption
  • Root cause: Second program_execute call should be inside else block but was placed after the if/else
  • Fix: Move second program_execute into the else branch
  • Impact: CRITICAL - causes data corruption during garbage collection

2. nand_is_bad() - Inverted ONFI logic (nand_impl.c, lines ~111-128)

  • Issue: Checks if bad block marker != 0xFFFF instead of == 0x00 per ONFI specification
  • Root cause: Uses uint16_t and reads 2 bytes instead of uint8_t reading 1 byte at page_size offset
  • Fix: Change to uint8_t, read 1 byte, check == 0x00
  • Impact: HIGH - blocks incorrectly marked as bad, dhara fails to find valid blocks

3. nand_mark_bad() - Wrong ONFI marker size (nand_impl.c, lines ~130-158)

  • Issue: Writes 2-byte (uint16_t) bad block marker instead of 1-byte per ONFI standard
  • Fix: Change to uint8_t bad_block_indicator = 0x00, write 1 byte at page_size offset
  • Impact: MEDIUM - non-standard bad block marking

Functional Issues:

4. nand_is_free() - Unreliable marker check (nand_impl.c, lines ~254-274)

  • Issue: Only checks 2-byte marker at page_size+2 instead of comprehensive page+OOB check
  • Root cause: Marker may not be synchronized with actual page data
  • Fix: Check entire page_size + 4 bytes for 0xFFFFFFFF pattern
  • Impact: MEDIUM - causes dhara to misidentify free pages, affecting checkpoint detection

5. nand_prog() - Unnecessary marker write (nand_impl.c, lines ~203-228)

  • Issue: Writes unnecessary used_marker at page_size+2 offset
  • Fix: Remove the marker write operation
  • Impact: LOW - inefficiency only

Reference Implementation:

All fixes should align with ONFI specification section 3.1 (Bad Block Management):

  • Bad block indicator is single byte (0x00) in first spare area byte
  • Located at page_size offset (column address 2048 for 2K page devices)

It appears to be working. I compared the old driver and the new, found some issues. Also a few dumb bugs on my part. I’d rather move forward than tap out everything I changed, so here is an AI slop summary of the changes I pushed.

Here we go.

  • Load firmware with old NAND driver
  • Format the drive
  • Copy backup of files to the drive
  • Update to firmware with new driver

ls shows files, the NAND is working and FAT is mounted.

i command is a combination of storage not detected and configuration file loaded.

Now, is this a driver issue or a higher level firmware bug. Let’s find out.

ETA:

Storage: Not Detected
Storage mount error: 13

Configuration file: Loaded

Error appears to be 13 again. Yet it mounted?

ETA:

Storage:   0.10GB (FAT16 File System)

Configuration file: Loaded

Got it. When the terminal connects we mount the storage a second time to sync any changes to the USB disk on the PC side. This appears to now report an error while the actual file system keeps working.

I’m gonna guess this is because we still have some malloc and alloc and calloc in place.

ETA:

Storage:   0.10GB (FAT16 File System)

Configuration file: Loaded
...

HiZ> ls
       256 1wire.bin
       256 25x02.bin
        34 bp2wire.bp

That was indeed the issue :slight_smile:

Files read off the drive (old driver formatted and write files, read with new driver) are all identical. Files written with new drive and read with new driver are identical. I think we’ve done it :slight_smile:

Latest test firmware:

bus_pirate5_rev10.zip (394.8 KB)

Firmware main branch @ unknown (Jan 27 2026 13:29:46)
RP2040 with 264KB RAM, 128Mbit FLASH
S/N: 282E1F0B134063E4
Storage:   0.10GB (FAT16 File System)
NAND Flash Information:
  Manufacturer: Micron
  Page size: 2048 bytes
  Block size: 131072 bytes
  Pages per block: 64
  Total blocks: 1024
  Total capacity: 131072 KB

This is a bit verbose lol. I’ll add a new disk info command for probing the nand and include this info, and just show the manufacturer in the info screen.

ETA:

Decided to just go with the manufacturer name.

TODO:

  • Mount BP5s with various brands of NAND and give it a go
  • Add disk command to get extended NAND and Dhara info
  • Add a NAND dump command to SPI mode
  • Debug why new driver doesn’t see existing file system made with old driver
  • Remove all dynamic memory allocation
  • Glue to FatFS
  • Add mutex around disk access functions (copy from existing implementation)
  • Get into a single repo (dhara + nand driver)
  • Use the existing garbage collection and error blocks settings in the current firmware

Final test for BP5 and BP6:

bus_pirate5-6-NAND-test.zip (784.6 KB)

This is a table of all the supported NAND chips that should be drop in replacements for our current Micron NAND.

The prices are not at all right, a lot aren’t in stock at SZLCSC so the quote is stale. For that reason I included the Digikey RMB price @200+ where available.

The page size matters:

  • RP2040’s limited memory (BP5) means we’re stuck with max 2K page size, so the biggest chips are only possible on RP2350 devices (BP6+)
  • Most of our writes are small config files, firmware, EEPROM & flash dumps, so the smaller sector size the less wear from moving things around the chip. Even if we can use a 4K page size, it might not be the best option.
Manufacturer Part Number Size (Gb) Page Size (bytes) Planes Voltage Range SZLCSC/Digikey RMB Scalper RMB
Winbond W25N512GVxIG/IT 0.5 2048 1 2.7-3.6V
Winbond W25N01GVxxxG/T/R 1 2048 1 2.7-3.6V 43.41/25.8
Winbond W25N02KVxxIR/U 2 2048 1 2.7-3.6V
Winbond W25N04KVxxIR/U 4 2048 1 2.7-3.6V
GigaDevice GD5F1GQ5UExxG 1 2048 1 2.7-3.6V 19.7/28.1
GigaDevice GD5F2GQ5UExxG 2 2048 1 2.7-3.6V
GigaDevice GD5F4GQ6UExxG 4 2048 1 2.7-3.6V
Alliance Memory AS5F31G04SND-08LIN 1 2048 1 2.7-3.6V 21.99/51.67
Alliance Memory AS5F32G04SND-08LIN 2 2048 1 2.7-3.6V
Alliance Memory AS5F34G04SND-08LIN 4 2048 1 2.7-3.6V
Alliance Memory AS5F38G04SND-08LIN 8 4096 1 2.7-3.6V
Micron MT29F1G01ABAFDSF-AAT:F 1 2048 1 2.7-3.6V 11.46/38.7
Micron MT29F2G01ABAGDWB-IT:G 2 2048 2 2.7-3.6V
Micron MT29F4G01ABAFDWB 4 4096 1 2.7-3.6V
Zetta ZD35Q1GC 1 2048 1 2.7-3.6V 10.69/x
XTX XT26G08D 8 4096 1 2.7-3.6V 93.98/x

Of the major brands, it seems like we already have about the cheapest option. Let’s get a market quote for all of the 1GB devices.

Zetta seems like a domestic manufacturer, and that price is very attractive. Not sure I’d really trust my data to it though.

Have you forced garbage collection, to verify the corruption you fixed now does the right thing?

Also, I’ve seen different management of the flash. Look how NAND itself tracks and reports bad blocks. IIRC, first bytes on erased page set to specific value when bad? (Cannot check now)

Overall, great to see movement to a supported library, as it not only improves chip options, but also increases user base (thus finding bugs more rapidly across all products using the library). :tada:

There is this quote in the old nand driver:

// Refer to MT29F2G01ABAGD datasheet, table 11 on page 46:
// Bad blocks can be detected by the value 0x00 in the
// FIRST BYTE of the spare area.
// This is ONFI-compliant, so should be universal nowadays.

From ONFI 3.1:

And I think this part is relevant. The 0x0000 is correct for 16 bit (parallel?) access, but I think all these serial NANDs are strictly 8 bit.

Before pushing it to main, I plan to make a new command for lower-level detailed debugging of the NAND. A way to report bad blocks, used blocks, examine pages, etc.

Do you know how to force garbage collection and what we’re looking for? I haven’t gotten that far yet.

Sorry, I don’t recall the architecture well enough to guide on that. Just noticed the fixes included something in that area, so thought it worth suggesting testing the code path.

Me == :face_with_bags_under_eyes: :sleeping_face:

1 Like