BP10 crashed while reading flash

I have a BP8 and BP10 (first release). I updated both to firmware to
Firmware main branch (2024-04-03T13:07:16Z)
My BP10 has a fix for the first bug but I never did the second hardware fix. I didn’t feel comfortable performing this fix.

I am testing a flash chip. To be precise, I copied the flash data from one chip to a second.
Here’s one odd thing.
I am able to read and verify the flash contents on my BP8.
But when I try to duplicate the same steps on my BP10, the “flash read” command aborts early in the cycle. The board disconnects from my computer, and when I look for the file system, none is mounted.
In addition, my syslog says:

Apr  3 13:00:04 mycpu kernel: [153678.739868] usb 1-2: reset full-speed USB device number 105 using xhci_hcd
Apr  3 13:00:04 mycpu kernel: [153678.867216] usb 1-2: device descriptor read/64, error -71
Apr  3 13:00:05 mycpu kernel: [153679.107411] usb 1-2: device descriptor read/64, error -71
Apr  3 13:00:05 mycpu kernel: [153679.347231] usb 1-2: reset full-speed USB device number 105 using xhci_hcd
Apr  3 13:00:05 mycpu kernel: [153679.475360] usb 1-2: device descriptor read/64, error -71
Apr  3 13:00:05 mycpu kernel: [153679.710858] usb 1-2: device descriptor read/64, error -71
Apr  3 13:00:06 mycpu kernel: [153679.950819] usb 1-2: reset full-speed USB device number 105 using xhci_hcd
Apr  3 13:00:06 mycpu kernel: [153679.950936] usb 1-2: Device not responding to setup address.
Apr  3 13:00:06 mycpu kernel: [153680.158929] usb 1-2: Device not responding to setup address.
Apr  3 13:00:06 mycpu kernel: [153680.366952] usb 1-2: device not accepting address 105, error -71
Apr  3 13:00:06 mycpu kernel: [153680.495138] usb 1-2: reset full-speed USB device number 105 using xhci_hcd
Apr  3 13:00:06 mycpu kernel: [153680.495410] usb 1-2: Device not responding to setup address.
Apr  3 13:00:06 mycpu kernel: [153680.703213] usb 1-2: Device not responding to setup address.
Apr  3 13:00:06 mycpu kernel: [153680.910911] usb 1-2: device not accepting address 105, error -71
Apr  3 13:00:06 mycpu kernel: [153680.911330] usb 1-2: USB disconnect, device number 105

It does it each time I try. This is the first time I’ve tried to use my BP10 w/SPI. One other this I noticed was this sequence:

SPI> W
Power supply
Volts (0.80V-5.00V)
x to exit (3.30) > 
Maximum current (0mA-500mA), <enter> for none
x to exit (none) > P
3.30V requested, closest value: 3.30V
0.0mA requested, closest value: 0.0mA

Power supply:Enabled

Error: Current over limit, power supply disabled

So I set it to 100ma. And when I continued, it crashed.
I’m not sure if my problem is caused by the hardware bug I didn’t fix, or it’s something different.

Hitting P here caused the current limit error. This is a bug I’m aware of (shouldn’t accept the non-numeric input), but I don’t have an immediate solution. Shortly I’m going to rework the /ui/ stuff the same way I reworked the /pirate/ stuff. This will also address the toolbar ghosting in non-vt100 mode you reported.

The crash during dump is really weird. You have previously reported that. The only thing I can think of is a pointer being off somewhere, but I spent some time in there and cleaned it up, and it seems to persist.

What chip are you dumping? I can get one and try to reproduce it.

I ordered one of each of these W25QXX chips from the WAVGAT Official Store

In other words, I ordered a 32MB, 64 MB, and 128MB. This is one of them. Note that when I tested it, it only has 8MB… Others from my order also have either mismatched flash size IMHO. Or I can’t rad spec sheets.
Here is the results from flash info.

W25G.txt (924 Bytes)

Now I have both this chip on the breakout board as shown, and a bare chip that I removed from a router. I picked this particular chip because it was the same size and the chip I am copying. I used my Bruschetta board and flashrom to clone the flash from my router onto the new chip from WAYGAT. Both of these chips have the same flash contents. Please note that my BP10 crashes while reading either one. My BP8 “reads” the chip without crashing. However, I am not able to consistently read this chip with my BP8.
I have two sets of chip clips - one on the bare chip and one on the WAYGAT chip. I’ve been switching these back and forth looking for a repeatable pattern. I can read and verify a chip, and that is consistent.

I need more testing, but I can read and verify a single chip, but (a) the two chips don’t give me the same value, and (b) perhaps new firmware revisions affects what I read.

I’m using essentially the same BP commands except the bare chip I enable pull-ups.

Oops. [Smack forehead]

The flash driver we use has 0x40 0x17 as a W25Q64CV. Which is 64mbits/8 = 8MBytes.

The datasheet shows the answer to 0x9f command as 0x4017, so that seems right.

The SPI was really messed up for a few days earlier this week, so some variation is expected.

The page size on these chips is 256bytes, so that shouldn’t be causing a pointer issue. At what point does it crash? Right away, towards the end?

Smack head again. :slight_smile:

It crashed early in the sequence. I’ll try to be more specific next time.

1 Like

Using today’s build, this is what the screen reports

4 KB Erase is supported throughout the device (instruction 0x20)
Write granularity is 64 bytes or larger
Flash status register is non-volatile
3-Byte only addressing
Capacity is 8388608 Bytes
Flash device supports 4KB block erase (instruction 0x20)
Flash device supports 32KB block erase (instruction 0x52)
Flash device supports 64KB block erase (instruction 0xD8)
Found a Winbond  flash chip (8388608 bytes)
Flash device reset success
Dumping to file1...
[o o o o o o o o o o ]
[12:18:28.443] Disconnected
[12:18:28.444] Error: Could not read from tty device

The LED's are still going through a sequence of pulsing red/green/yellow/blue

but it disconnected from the computer.  I cannot reconnect. No file is mounted. and no TTY (i.e. /dev/tyACM0) is active.

Perfect, thank you. That is definitely a pointer going awry.

Capacity is 16777216 Bytes
Flash device supports 4KB block erase (instruction 0x20)
Flash device supports 32KB block erase (instruction 0x52)
Flash device supports 64KB block erase (instruction 0xD8)
Found a Winbond  flash chip (16777216 bytes)
Flash device reset success
Dumping to test2.bin...
[-------------------C]
Dump OK

SPI>

I don’t have a 64mbit chip, but I have a 128mbit. No crash.

I wonder if it isn’t something with your NAND chip? You said rev8 is ok, but rev10 does this crash?

SPI> m 5


Use previous settings?
 I2C speed: 100kHz

y/n, x to exit (Y) >

When you exit and re-enter modes are the settings saved and reloaded?

Does ls show the contents of the drive, and configuration files like bpconfig.bp, etc?

I2C> cat bpuart.bp
{
"baudrate": 9600,
"data_bits": 8,
"stop_bits": 2,
"parity": 1
}

I2C>

Are you able to print the contents of any file with cat bpconfig.bp for example?

Yes. I just tried it and I can.
However, I might have executed the "format: command earlier. I was having inconsistency betweew the BP file system and the external view of the file system. I know I did it on the BP8.

This morning, it took about 30 seconds before it crashed.

Flash device supports 64KB block erase (instruction 0xD8)
Found a Winbond  flash chip (8388608 bytes)
Flash device reset success
Dumping to file2...
[-Co o o o o o o o o ]

It created the new file, but it has zero bytes in length.

Just tried it on latest source
Firmware main branch (2024-04-05T11:57:35Z)

It crashed about 27 seconds after I started it.

What is the max file size for the NAND file system?


/dev/sda1        94M  8.1M   86M   9% /media/me/5021-0000

ls -l /media/me/5021-0000/
total 8216
-rw-r--r-- 1 me me       0 Dec 31  1979 BP10O
-rw-r--r-- 1 me me     333 Dec 31  2019 BPCONFIG.BP
-rw-r--r-- 1 me me      81 Dec 31  2019 BPSPI.BP
-rw-r--r-- 1 me me      66 Dec 31  2019 BPUART.BP
-rw-r--r-- 1 me me 8388608 Dec 31  2019 FILE1
-rw-r--r-- 1 me me       0 Dec 31  1979 FILE2

FILE1 is a copy of the flash I captured on another system. I do not believe I running out of room.

I recall that when I was getting inconsistent results. i.e. the bus pirate said I had a file on the system but my linux system didn’t. On one of my BP’s (and I think it was the BP8 because it had an SD card) I did fsck on it (and fsck reported which I fixed) and reformated it using the BP format command. But I’m not sure if I did this on my BP10, so I apologize for not keeping more accurate notes.

I do remember that when I ran fsck - it wanted me to change the mount point name from “5021-0000” to something else. but that went away after I updated the BP.

I attached a debugger (see my other post). This is what it reported when it crashed

(gdb) monitor reset init
[rp2040.core0] halted due to debug-request, current mode: Thread 
xPSR: 0xf1000000 pc: 0x000000ea msp: 0x20041f00
[rp2040.core1] halted due to debug-request, current mode: Thread 
xPSR: 0xf1000000 pc: 0x000000ea msp: 0x20041f00
(gdb) continue
Continuing.


Thread 2 "rp2040.core1" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 2]
isr_hardfault () at /home/me/Git/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:98
98	decl_isr_bkpt isr_hardfault
(gdb) 
Continuing.

Thread 2 "rp2040.core1" received signal SIGTRAP, Trace/breakpoint trap.
isr_svcall () at /home/me/Git/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:99
99	decl_isr_bkpt isr_svcall
(gdb) bt
#0  isr_svcall () at /home/me/Git/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:99
#1  <signal handler called>
#2  0x00ff0000 in ?? ()
#3  0x100292a8 in mkdir_options ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

2 Likes

At this point, the stack linkage looks wrong. mkdir_options isn’t a function but data

The max file size should be the max fat 16 file size.

Thank you so much for the debug info.

static const char * const mkdir_usage[]= {
    "mkdir <dir>",
    "Create directory: mkdir dir",
};
static const struct ui_help_options mkdir_options[]= {
{1,"", T_HELP_DISK_MKDIR}, //section heading
    {0,"<dir>", T_HELP_DISK_MKDIR_DIR}, 
};
void disk_mkdir_handler(struct command_result *res){
    //check help
   	if(ui_help_show(res->help_flag,mkdir_usage,count_of(mkdir_usage), &mkdir_options[0],count_of(mkdir_options) )) return;

Agree, that is very strange. the mkdir_options is only used three places in the code.

If it’s a hard fault, and ended up there, it makes me feel like a pointer issue. But how does a pointer issue show up as a corner case?

You’re using Linux. Do you have the opportunity to test under Mac or Windows? I wonder if there’s not some file system differences with how Linux and Windows attach to the MSD because this seems to happen when it tries to write. But, it doesn’t happen when saving mode configs. Maybe it’s because we open the file and keep it open so long for dumping and that causes issues? I could try opening/closing the file each time we grab a 256byte sector, but that would kind of thrash NAND management layer I suspect.

You mentioned the fdisk. It wasn’t clear - have you formatted it with the internal format command since then? If you’re willing, I’d suggest running the format again to remove any potential variables.

I was trying to read MX25L1606E chip - 16Mbit.
BP10 crashed 3 times in a row, then on a fourth try it read entire flash.
Windows reports USB device failure, but LEDs on BP still continue animation.

I could not reproduce it afterwards.

Thank you so much for additional confirmation! This seems like good news.

I received a handful of production BP5 this evening, so I hope I can reproduce it here as well. I will also order that chip.

I am pleased to report that with the latest firmware, I am able to clone the 64K flash chip that has caused me problems in the past. The BP10 didn’t crash. The contents was an exact match (I used sha256sum to get the cryptographic hash). The BP rev8 also worked fine.

Thank you for the update. I have a suspicion that there Is a problem in the dhara library that manages the NAND flash causing these issues.