Bus Pirate 5 resetting itself, causing storage corruption

I have an issue with my BP 5 REV10, as described in this thread Storage Not Detected on BP5

Because I’m new here, I reached my reply limit of 3. Yes, I know that’s there for a reason, but I wanted to share the further troubleshooting and what I’ve been seeing in the kernel log.

It looks like the BP is resetting itself while connected to the serial console (for reference, I’m using minicom on Ubuntu 22.04). After it resets, the serial console reconnects, but it looks like the BP doesn’t always set Write Protect on . If I do any access to the storage via the serial console after a reset when Write Protect is not on, the storage becomes corrupt.

Throughout this entire log snippet, the BP’s “first” serial port was open in minicom:

[ 6565.542540] usb 3-3.4.1.4.1: reset full-speed USB device number 29 using xhci_hcd
[ 6565.683990] cdc_acm 3-3.4.1.4.1:1.0: ttyACM0: USB ACM device
[ 6565.684585] cdc_acm 3-3.4.1.4.1:1.2: ttyACM1: USB ACM device
[ 6577.631370] sd 0:0:0:0: [sda] 16 512-byte logical blocks: (8.19 kB/8.00 KiB)
[ 6577.632011] sd 0:0:0:0: [sda] Write Protect is on
[ 6577.632017] sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00
[ 6577.633738] sda: detected capacity change from 191296 to 16
[ 6577.636856] sd 0:0:0:0: [sda] 16 512-byte logical blocks: (8.19 kB/8.00 KiB)
[ 6577.852054] sd 0:0:0:0: [sda] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[ 6577.854366] sda: detected capacity change from 16 to 191296

[ 6602.502973] usb 3-3.4.1.4.1: reset full-speed USB device number 29 using xhci_hcd
[ 6602.644459] cdc_acm 3-3.4.1.4.1:1.0: ttyACM0: USB ACM device
[ 6602.645071] cdc_acm 3-3.4.1.4.1:1.2: ttyACM1: USB ACM device
[ 6602.654779] sd 0:0:0:0: [sda] Write Protect is off
[ 6602.654787] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 6602.657101] sda: detected capacity change from 191296 to 0
[ 6602.666120] sd 0:0:0:0: [sda] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[ 6602.668364] sda: detected capacity change from 0 to 191296

[ 6647.381484] usb 3-3.4.1.4.1: reset full-speed USB device number 29 using xhci_hcd
[ 6647.522356] cdc_acm 3-3.4.1.4.1:1.0: ttyACM0: USB ACM device
[ 6647.522692] cdc_acm 3-3.4.1.4.1:1.2: ttyACM1: USB ACM device

At this point, I did an ls in the serial console:

[ 6663.309216] sda: detected capacity change from 191296 to 0
[ 6663.317922] sd 0:0:0:0: [sda] tag#0 access beyond end of device
[ 6663.317937] I/O error, dev sda, sector 252 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 6663.317942] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[ 6663.331067] sd 0:0:0:0: [sda] 16 512-byte logical blocks: (8.19 kB/8.00 KiB)
[ 6663.331867] sd 0:0:0:0: [sda] Write Protect is on
[ 6663.331871] sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00
[ 6663.333593] sda: detected capacity change from 0 to 16
[ 6663.340454] sda:

I’ve also noticed that the serial device sometimes becomes locked and unavailable while in this state. For example, if I close minicom, I can’t reopen unless I sudo it. It makes me think that something isn’t getting cleaned up when the xhci_hcd reset is happening, and there are still some processes hanging out there.

Let me know what else I can grab, this seems to be pretty repeatable on my system.

4 Likes

I, too, had problems using the storage system. I documented them in this thread.. I haven’t tested the problem with the recent versions (Sorry - busy).

4 Likes

Thank you so much for the additional info.

I believe you both had this issue under Linux? I will need a linux system to replicate this. Is it something happening reliably on multiple Linux flavors?

My suspicion is that the detach/attach with changing write status is wreaking havoc on something. Like you suggest, maybe we aren’t fully releasing everything and it’s attaching to a partially/misconfigured process. It could be something simple like adding a delay, but that’s probably optimistic.

3 Likes

Thanks for the quick reply!

Yes, a delay was my first thought, too. I only have Linux, so it’s always used on Ubuntu 22.04 for me.

The serial terminal is continuously connected while this is happening, so maybe a short delay before detatch if the BP thinks the serial is closing? Maybe it’s a too aggressive on the timing of the detatch/attatch cycle?

I can try another serial terminal to see if there’s a change; I’ve used minicom so long that I never think of anything else.

3 Likes

I’d be interested to hear what happens with a different terminal, but I doubt the terminal choice is cause the mass storage device to attached in a weird way.

I added this to my priority bug list and will push some simple test fixes ASAP. Please hang in there though, I’m moving slow on firmware the next couple of days.

4 Likes

I did try different terminals - gtkTerm for a gui thing, and just plain screen /dev/ttyACM0. Both gave me the same thing.

One more note - when it gets into the “bad” state, I need to restart UDEV (with the BP unplugged) to get things working again. I realized I hadn’t mentioned that earlier; sorry for that omission.

Thanks again!

3 Likes

try use GitHub - tio/tio: A serial device I/O tool is super convenient (have nice config file and device specyfice configuration) and I’m very interesting is that you will have this some problem. I use it on macOS without problem

3 Likes

Hi Mario,

I installed and tried tio. It worked about as well at minicom; I could see in dmesg where the BP would detach/attach. It also showed in tio with the messages disconnected/connected.

I suspect the issue is deeper down the stack than the serial terminal; most likely at the udev layer. Any serial terminal would sit on top of that, so I would expect they all have the same symptoms.

tio is a nice little serial terminal; reminds me of minicom. I’m just too lazy to learn new keybindings at this point :slight_smile:

3 Likes

Quick questions:

  1. You mentioned doing SPI writes … is that on external SPI flash (e.g., using one of the SPI planks)?
  2. If you disconnect everything connected to the BP except the USB cable (e.g., planks and/or wires), does the issue go away?
  3. If it’s part of your distribution, have you disabled and uninstalled modem manager?
  4. Do you know what (if anything) would be writing to the exposed FAT volume? (see logs below)

The way I interpret the following log snippet, it sounds like something on the Linux host is writing to BP5’s exposed FAT volume. Specifically, this log shows a write to sector 252:

If so, this would confirm my long-held suspicion that the method used to expose the BP5’s internal storage may not be foolproof.

I’m really hoping we can add a third serial port (for at least debug builds) where we can redirect debugger output. This would allow debugger output to be captured, without connecting to the two COM ports that could change the BP5’s internal state. This is a prime use case for the third debug-output-only COM port, because connecting to the COM port is, IIRC, causing the storage device to be surprise-removed from the host.

1 Like
  1. I was reading a SPI flash chip and saving contents to the internal storage. I wasn’t using a plank, just leads and clips.

  2. Today I had the leads disconnected while experimenting with another serial terminal and saw dmesg messages indicating the attach/detach cycles.

  3. I’ll have to check in modem manager

  4. That sequence from the kernel log came out immediately after I did an ls in the serial terminal.

2 Likes

Setting up remote server desktops with Ubuntu and Linux mint for cloud access to hardware for debug sessions now. Can assist remotely if necessary and timing aligns.

OK. So…

  1. You’re connected via the main COM? (or to the binary mode COM port terminal?)
  2. You’re using that COM port to send commands to dump external SPI flash to BP5’s internal flash
  3. Corruption is common with above scenario

BP5 behavior

Although I’ll need to verify, I understand that the current BP5 code exposes the internal flash to the host (e.g., Linux) as a FAT volume … at a level above dhara (e.g., with sector remapping). Both the BP5 firmware and the host at this point can read and write to the FAT volume.

When the COM port is connected to, the BP5 fakes a surprise-removal of its exposed FAT volume. This was intended to prevent the host from writing to the media at the same time as the firmware, and to cause the host device to discard any cached data it had related to the exposed FAT volume. Of course, if the host had pending / cached writes, this surprise remove will cause those cached writes to be lost.

When the COM port is closed, the host’s access to the exposed FAT volume is restored. All cached data on the host should have been invalidated.


Two Owners of a File System?

There really is no safe way for two sources of updates to a single FAT volume. This is one reason the MTP protocol exists … allowing a camera or other firmware to “own” how files are modified / created / deleted, while also a host to do these operations on the same volume safely. In essence, MTP forces the firmware to be the “source of truth” about the storage volume.

For the record… I’m not a fan of MTP, even though I understand its purpose. The BusPirate might be a good use case.

I’ve looked, and found existing open-source MTP implementations written in C++, virally-licensed (GPLv3), or with significant bugs … none would be a good fit, imho.


MTP on BP5?

I don’t know how to start getting an MTP responder working here. There are also difficult problems, such as ensuring that the file’s ID remains fixed for the entire MTP session. It’s not as simple as just incrementing the ID for each file … the MTP protocol expects a file can be opened and re-opened repeatedly with the same ID (if the file contents remained the same). Thus, it’s typical to have a global index … but that either takes RAM or an on-storage database.


The next question: Is there a utility that will allow you to track who has the BP5’s exposed FAT volume open on the host side (and thus what’s writing to the media from the host side)?

2 Likes
  1. I am connected to the main COM (/dev/ttyACM0).

  2. I first noticed when issuing flash read -f flash.bin, but have been able to reproduce it with a mkdir from HiZ

  3. Not sure what you’re asking.

To see if anything has the filesystem open? I could write a script that does lsof every second and greps that for either sda1 or BusPirate.

I know you’re thinking that with the kernel log lines, but I don’t believe anything on the host side is writing to the fs.

2 Likes

That’s fair. Still, as this is not reproducible in all contexts, being sure that nothing is mounting and writing to the BP5’s exposed FAT volume would go a long way to excluding the dual-writer/single-FS as a cause.

In fact, I’d be thrilled if it was excluded: reducing complexity will increase the ability to track down the final cause.

Separately, if you can prevent automount, that would also serve the same purpose of ensuring no host FS writes. However, it’s been too long since I used a real Linux box to be sure how that would be disabled nowadays…

3 Likes

I will do some testing in the morning, both with trying to verify nothing has the filesystem open, and to see if I can stop the automount. I haven’t turned it off before, so I’ll have to research it.

Either way, I’ll report in the morning

Thanks for the feedback! :slight_smile:

1 Like

This morning I tested with a small shell script that checks for open files on the flash system:
chk.txt (166 Bytes)
(changed name to .txt to upload). Stupid simple, but works to list anything that has the flash filesystem open, including just a terminal.

I’ll repeat later with automount turned off (I’ll use systemctl stop udisk2 to do that.

I then did the same tests and was able to corrupt the filesystem. The script never reported anything connected.
Here’s the kernel log during that time:
kernel.txt (7.6 KB)

1 Like

This afternoon I had a chance to test with automount disabled.

tl;dr version - I was not able to corrupt the storage, and I saw no detach/attach cycles of the BP5 in the kernel log while automount was disabled. However, once automount was reenabled, the storage was easily corrupted.

Here’s the sequence:

  1. Disconnect BP
  2. Reset udev - sudo systemctl reset udev
  3. Disable automount - sudo systemctl disable udisk2
  4. Plug in BP5 - this is the kernel log:
[33588.522220] usb 3-3.4.1.4.4: new full-speed USB device number 100 using xhci_hcd
[33588.666665] usb 3-3.4.1.4.4: New USB device found, idVendor=1209, idProduct=7332, bcdDevice= 1.01
[33588.666687] usb 3-3.4.1.4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[33588.666695] usb 3-3.4.1.4.4: Product: Bus Pirate 5
[33588.666700] usb 3-3.4.1.4.4: Manufacturer: Bus Pirate
[33588.666705] usb 3-3.4.1.4.4: SerialNumber: 264235D3012961E4
[33588.695021] cdc_acm 3-3.4.1.4.4:1.0: ttyACM0: USB ACM device
[33588.696634] cdc_acm 3-3.4.1.4.4:1.2: ttyACM1: USB ACM device
[33588.697886] usb-storage 3-3.4.1.4.4:1.4: USB Mass Storage device detected
[33588.698590] scsi host1: usb-storage 3-3.4.1.4.4:1.4
[33589.738434] scsi 1:0:0:0: Direct-Access     BP5      Storage          1.0  PQ: 0 ANSI: 2
[33589.739567] sd 1:0:0:0: Attached scsi generic sg1 type 0
[33589.750043] sd 1:0:0:0: [sdb] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[33589.752415] sd 1:0:0:0: [sdb] Write Protect is off
[33589.752425] sd 1:0:0:0: [sdb] Mode Sense: 03 00 00 00
[33589.753245] sd 1:0:0:0: [sdb] No Caching mode page found
[33589.753256] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[33589.777785]  sdb: sdb1
[33589.778186] sd 1:0:0:0: [sdb] Attached SCSI removable disk
  1. start minicom - kernel log:
[33674.810070] sd 1:0:0:0: [sdb] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[33674.811305] sd 1:0:0:0: [sdb] Write Protect is on
[33674.811318] sd 1:0:0:0: [sdb] Mode Sense: 03 00 80 00
[33674.824011]  sdb: sdb1
  1. Verify storage not mounted on host box:
┌──(matty💊s76)-[~/bin]
└─$ mount | grep sdb             
                                                                                                                      
┌──(matty💊s76)-[~/bin]
└─$ ll /media/matty 
total 0
  1. Verify flash storage is empty (in minicom):
verify flash storage is empty:
HiZ> ls
0 dirs, 0 files
  1. Set up SPI mode, creating config file. Verify file exists:
SPI> ls
        82 bpspi.bp
0 dirs, 1 files
  1. Output of kernel log when file was created on BP:
[33734.653593] sd 1:0:0:0: [sdb] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[33734.669752]  sdb: sdb1
  1. Do the SPI read flash read -f test.bin in minicom. Process completes with no issues.
  2. Verify file created on BP storage:
SPI> ls
        82 bpspi.bp
   8388608 test.bin
0 dirs, 2 files
  1. Also in kernel log after creation of test.bin on BP5 storage:
[33955.074484] sd 1:0:0:0: [sdb] 47824 2048-byte logical blocks: (97.9 MB/93.4 MiB)
[33955.092709]  sdb: sdb1

I did more testing such as creating directories, more SPI flash reads, plug/unplug BP5 USB cable, issue # reset in minicom, kill/restart minicom. At no time did I see the detach/attach messages in the kernel log and the BP storage was not corrupted.

I did not mount the storage on the host system until after all the testing and after closing minicom. To do that, I restarted the udisk2 service and unplugged/plugged the USB cable. Verified that all of the files and directories I created from minicom on the BP storage were accessible and that the volume was writable by creating a directory. Copied the files off of storage to the local host and verified they were what I expected.

I restarted minicom and did an ls - the storage was corrupt.

Welcome to minicom 2.8

OPTIONS: I18n 
Port /dev/ttyACM0, 14:42:16

Press CTRL-A Z for help on special keys
                                                                         
VT100 compatible color mode? (Y/n)> nHiZ>                                   
HiZ> ls                                                                     
4294967295 dha
   <DIR>   ��������.���
   <DIR>   ��������.���
     32846 e�
   <DIR>   �j
   <DIR>   ��������.���
   <DIR>   ��������.���
     32817 ����e�
5 dirs, 3 files

HiZ> 

Please let me know what you would like me to check next. At least I can continue on with my project knowing things will work fine if I disable automount for now. :slight_smile:

Thanks again for the quick responses and feedback!

2 Likes

Your steps and logging of what’s happening is excellent … thank you! Unfortunately, this could not exclude my worst fears of a difficult-to-track bug. :sob:

Linux still mounting the media

Did I understand correctly that your check script only logged open handles under /media/matty? (e.g., why not /dev/sdb1?)

From the kernel log:

[32659.577168] FAT-fs (sdb1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

This strongly suggests that the FAT file system was mounted, and the Linux system processed enough of the volume to know that the file system was FAT, and that the dirty bit was set.


Odd capacity ...

Later, the following interesting sequence occurs:

Sectors bps WP timestamp
47824 2048 yes 32688.939668
0 32691.329673
47824 2048 no 32691.339663
0 2048 yes 32692.054562
16 512 ??? 32692.061164
47824 2048 no 32692.251649

The troubling aspect of the above is the reporting of 16x 512 byte sectors. I don’t recall anyplace where the BP5 should report the NAND as being 8k.


There are also a number of commands failing with Invalid Command Operation Code. However, the kernel log didn’t log what the actual CDB (command descriptor block … at least that’s what we called it) was.

I can think of two private binaries that, if created, might verify my suspicion that the Linux host is still mounting (and writing to) the media.

  1. First private binary that ALWAYS makes the exposed FAT volume read-only for the host. With this private binary, you could perform the same steps as above, only the host would never be able to write to the media.
  • If first private binary does NOT prevent the corruption, then the issue is 100% on the device side, which in a way is good.
  • If first private binary prevents corruption, then it’s the difficult situation to track down. :sob: Continue to needing the 2nd…
  1. Second private binary could log all storage CDBs incoming (16 bytes each), timestamp, and the Sense/ASC/ASCQ reported (<16 bytes each). Sort of a micro bus analyzer. Maybe also log writes that were initiated by the firmware to see how they interleave (if at all). Maybe also log first 256 bytes of data when writing to the boot sector (it’s a fixed location). Need to think about the format to be used here.
    a. If this private binary shows no write commands coming from the host, then the issue is 100% on the device side, which in its own way is good.
    b. Otherwise, some deep decoding of that log to understand who’s writing a corrupt root directory sector, and then walking backwards to why it would be written, is needed.

If other folks have better ideas, let me know. Otherwise, I will create a branch for you to sync/build from.

1 Like

That’s right; when I tested the script, any mount or operations I manually did would trigger the lsof | grep combination. An earlier version of the script also did a mount | grep "sdb1", but that ended up being redundant as it didn’t report anything the lsof didn’t.

Yes - I noticed that too from earlier testing; the 8K capacity shows up quite a few times in the earlier logs, usually around the time of corruption.

I’d be more than willing to test anything you can come up with. With regards to schedule, I will be away from the 21st of this month until the 2nd of October and won’t be able to test anything during that time but will be able and willing before I leave or after I return.

For now, I can keep going on my current project using the BP; I’ll just disable/enable automount as needed.

I can’t thank the BP community enough for the help and quick response times! Maybe when I finish this current project, I can spend some time digging into the BP project itself and seeing if I can contribute anything. I’ve done some RP2040 code and have played with the PIOs, it’s a fun chip and one of the reasons I decided to upgrade from my older BP :slight_smile:

4 Likes
Found a way...
Cannot provide pre-built binary...

Cannot get build to work locally right now, so I’m abusing GitHub’s PR process (which builds and creates artifacts).

See PR #90. Unfortunately, at least the Windows builds are failing there.

So, with both my build environment and GitHub actions failing to build, I cannot provide a pre-built binary.


If you can build...

If you can build, and want to pull the branch locally, here are the steps to do so:

  1. cd ~/src/bp5 # or where you cloned the depot
  2. git remote add hg https://github.com/henrygab/BusPirate5-firmware
  3. git checkout -b fs_readonly hg/fs_readonly

Then build as normal. You can also then see the changes very easily.



1 Like