Questions, corrections, clarifications, and discussion are welcome.
This first post will contain background information on the technology in use. The second will describe the current design. The third post will explain some problems with the current design.
Summary of NAND stack
The BP5 Rev10 (and BP5XL, BP6) have a NAND chip soldered onto the board.
This NAND chip does not have any wear-leveling algorithms, nor any remapping of pages that have errors. Therefore, a layer is added above the raw NAND to provide some of these features, named Dhara. Presuming Dhara works, the result is the NAND appears slightly smaller in capacity, and sometimes a single operation will transparently cause multiple writes to the NAND.
Summary of FAT File System
On top of the Dhara layer is the FAT file system. The FAT file system has a few major structures:
BPB aka Boot Sector. The BPB identifes the format as FAT12, FAT16, FAT32, etc. It defines how large each FAT is, and for FAT32, defines the sector for the root directory.
FAT, aka file allocation table. Each entry in the FAT is a link to the next āclusterā of a file. Some values indicate that there are no more clusters associate with a given file.
Thus, if a file starts at cluster āXā, by following the clusters linked to in the FAT, all the sectors for that file are found. Put yet another way, itās a singly-linked list of clusters for a file.
Root directory. The root directory (true for all FAT directories) is nothing more than a file, whose contents are interpreted to contain file-system-defined structures. A directory has a cluster chain in the FAT, just like any other file.
For FAT12 and FAT16, the root directory is at a fixed location and size. Itās a special caseā¦ and why FAT12/FAT16 have a limit on number of files/directories in the root.
For FAT32, the root directory can be moved to a different starting cluster, and has a cluster chain like any other directory.
Thus, to allocate space for a file, or to extend a file past its currently allocated list of clusters, the file system implementation would have to:
Find a free cluster. This may require traversing the file system, since (until exFAT) there was no on-media tracking of free clusters.
Modify the FAT so the currently-final entry in the cluster chain points to that free cluster.
Modify the directory entry for the file to indicate the larger size.
As you can imagine, this is not efficient if reading and writing to the media for each step. Thus, most hosts / complex implementations will cache information from the media in RAM. The FAT is a likely candidate for this cache, as it needs to be modified for every file. The data for each directory are also likely to be cached. Often, a host will also convert the directory entries and FAT into an in-memory bitmap of which sectors on the media are used, to speed up the process of finding free sectors.
Storage device behavioral contract (as inferred by host)
If a media is mounted (even if read-only), only the host will initiate changes to the media contents.
If a sector of the media has content A, then unless the host sends a command that modifies the media, a future read of that sector while the media remains mounted will continue to have content A. (contents of mounted media wonāt change between reads ā¦ allowing caching of the values.)
Media that is ejected and then reinserted may have had its values changes.
The BP5 exposes the storage volume of the NAND only when it finds a valid FAT file system on the NAND.
When the BP5 firmware does not discover a valid FAT file system, a fake read-only 8k (16-sector) FAT volume is exposed with a hard-coded single text file.
By default, the BP5 firmware exposes the storage volume as R/W (readable and writable) to both the firmware and the host.
When a connection is made via the terminal, the media is surprise-removed, marked as read-only (for the host onlyā¦ firmware can still write to the media), and then re-exposed to the host.
The host will receive at least one error indicating that the media had been ejected as a result of this transition from { Host: R/W, firmware: R/W} ā { Host: R/O, firmware: R/W }.
When the terminal connection is closed, the host similarly receive a second error indicating that the media has been ejected, as the media transitions from { Host: R/O, firmware: R/W } to { Host: R/W, firmware: R/W }.
Thus, for a large percentage of the time, the host sees the media as being rewritable simultaneously with the media being mounted by the firmware (and sometimes, even seen as writable by the firmware at the same time).
Unfortunately, having a FAT volume that is writable by one initiator (e.g., the firmware or the host), while simultaneously allowing access by another initiator is inherently succeptible to data corruption. This post will attempt to explain some of the conditions that this could occur under.
At present, the following are the ONLY states that should be allowed for a FAT volume with two potential initiators:
Host
Firmware
Notes
None
None
Useful for intermediate states
R/O
R/O
Both firmware and host can only read
R/W
None
Only host can read or write the media
None
R/W
Only firmware can read or write the media
ā¦ TODO: Fill in some of the details showing multi-initiator corruption when either initiator has write access.
rough draft scenarios
ā¦ generally, cached data updated by one initiator might not be immediately visible by the other initiator ā¦
Example:
Host wants to extend file \alpha\foo.bin, which currently uses only cluster F
Firmware wants to extend file \beta\bar.bin, which currently uses only cluster B
Host reads FAT / directory structures, and finds that sector X is free, and will use that sector to extend \alpha\foo.bin
Firmware reads FAT / directory structures, and finds that sector N is free, and will use that sector to extend \beta\bar.bin
Firmware starts to extend file \beta\bar.bin by updating the FAT entry for cluster B to link to cluster X, and ensuring the entry for cluster X indicates end-of-chain. Thus, file has chain of B -> X -> EOF. This is written to the FAT.
Host starts to extend file alpha\foo.bin by updating the FAT entry for cluster F to link to cluster X, and ensuring the entry for cluster X indicates end-of-chain. Thus, file has chain of F -> X -> EOF. This is written to the FAT ā¦ the FAT chain is now cross-linked (corrupt).
Host and Firmware update the directory entries for their respective files to indicate the larger file size.
Either / both of them write data to the second cluster of the corresponding file. The data is stored at sector X in both cases, with one overwriting the other. More corruption.
similar situation possible when host caches the mediaās information, and then firmware does ANY update to the file system structures. the host doesnāt see those updates. the host could read invalid data (using the cached entries), write to the wrong sectors, etc.
Let me summarize the problems I have seen on Linux.
When I connect to the BP interface with tio, and simultaneously copying a file onto the BP, I should get an error on the command line because when I am connected to the BP, the file system should be read only. No error is reported but the system logs report problems with the file system. At times, I have tried this and Linux goes into a state where it will no longer mount the file system, even if I unplug and replug the BP. I have to reboot Linux to mount the BP file system again.
I created a syslog of the events with more details
I tried to find references to Microsoftās āSurprise removal sequenceā WRT Linux systems, and they seem to refer to hot-pluggable NVMe drive, which is mentioned on servers with virtual machines. So far I have not seen any discussion on how to handle the case when the attached deviceās file system decides to switch to Read Only.
Looking for similar low-level errors like āMode Sense: 03 00 80 00ā mention file system corruption.
It sounds like Linux is not noticing that the media has changed. It may be that the BP is not fully emulating a media change sequence, and thus some OS are āmissingā the transition when it occurs.
Iām reaching back into my history for this, so errors are likely:
Status / SK / ASC / ASCQ
Historically, Windows OS was based on decoding SCSI status codes `Status / SK / ASC / ASCQ` (Status, Sense Key, Additional Sense Code, Additional Sense Code Qualifier). ATA devices would have their responses translated to the SCSI status, to work with the existing storage stack. USB devices wrapped SCSI commands, and similarly use the same error codes.
Status was typically either 0x00 (success) or 0x02 (check condition). Historically, the SK/ASC/ASCQ was only requested when status was non-zero. Nowadays, itās often automatically provided at the same time.
Sense Key
SK meanings generally:
0x00 means no error
0x01 means a āsoftā error, such as data that was automatically corrected by ECC
0x02 covers the ānot readyā errors, such as no media present, device spinning up the media, etc.
0x03 covers medium errors, such as damaged sectors, failed writes, etc.
0x04 covers hardware errors. Rare to see, may be treated similar to 0x03
0x05 covered illegal requests ā¦ the command sent to the device had wrong bits set, an LBA out of range, or it couldnāt be processed due to current state (e.g., eject when device locked)
0x06 was UNIT ATTENTION ā¦ when the device went from not-ready (e.g., no media) to read (e.g., media ready for access), or when mode parameters changes, the device was required to report this so the host could interrogate the device to see whatās changes / flush caches, etc.
So, what would I expect a device to do, when transitioning from no media present, to having media?
SK
ASC
ACQ
Notes
02
3A
00
Medium not present
06
29
xx
Device was reset
06
28
00
Not ready to ready (medium changed)
00
00
00
No error
??? Could one or more of the above be required on Linux, but be missing from the BP responses ???
Thank you Henry! Super useful. The default implementation of the USB drive comes from the tinyusb library. It only has āNo Errorā and āMedium not presentā as sense responses. I will add the 2 remaining states to our implementation and see if Linux behaves reasonably then.
PS If you find the document that describe this I would be happy to keep it as a reference.
Based on your syslog, it is obvious that after you connected to your BP with tio, Linux has noticed that the BP is read-only.
āTrying to write to read-only block-device sda1ā
Iām not sure why the command line does not give you an error.
Linux didnāt know the file system was mounted read only. When I do āmount -lā to list the current mounts, itās clearly RW.
/dev/sda1 on /media/grymoire/BUS_PIRATE5 type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022
,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,showexec,utf8,flush,errors=remount-ro,
uhelper=udisks2) [BUS_PIRATE5]
Note that the mount option says to remount the file system read only in the case of an error. But from what I understand, this only applies if an error occurs during the mount process.
I suspect at the file system level - itās trying to write a file, yet at the driver level - it fails because the device no longer accepts write commands, and the system reports errors like:
critical target error, dev sda, sector 544 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
That is - I suspect the OS wants to write a file and the hardware is unable to do so.
This discussion brings back a fuzzy memory from like 2010 or so, when I was using a USB mounted, simulated flash drive to download firmware updates to a set top box powered by an LPC1768 (Arm Cortex M3). I was using Windows and we expected 99% of consumers would be as well. But just for fun, one day I tried it using a Linux system and it failed.
I dug into it briefly, and there was a subtle mismatch between how Windows wrote to the USB mounted drive and how Linux did so. I think I even figured out how to fix it, but the client told me to pass on that task (we were backlogged with more serious problems).
As I recall, it had something to do with the sequence of writing the data? Itās real fuzzy 10+ years on. If this sounds potentially useful to investigate the issues here, let me know and Iāll dig up my notes and post a more detailed explanation.
I was just looking at a BusPirate where I did a āformatā on it, but when I re-connected it, the file system did not mount. I had to reboot my Linux system to allow mounting.
The syslog said that the file system was dirty (unmounted properly) and it wouldnāt re-mount it.
I think I read somewhere that the boot block (where the file system status is stored) should be writable even if the file system is read-only. Is it possible Linux is trying to mark the drive and fails?
At the early posts, I posited that only a few states were fully safe.
Technically, a couple more exist (single-initiator, read-only).
Hereās a full table of the states, and quick notes about why they are safe or not safe.
PLEASE ASK FOR DETAILS IF YOU BELIEVE A STATE MARKED UNSAFE IS ACTUALLY SAFE.
I will then provide more specific timelines of which initiator does what actions. This, however, takes significant work to write (and triple-check), so I would prefer to do it only if necessary.
Host
Firmware
Safe?
Problems
None
None
Safe
Useful for intermediate states
R/O
None
Safe
None
R/O
Safe
R/O
R/O
Safe
Both firmware and host can read, cache the data, but neither one changes the data
R/W
None
Safe
Single-initiator
None
R/W
Safe
single-initiator
R/W
R/W
Not safe
e.g., host reads/caches the FAT; firmware allocates space; host later allocates that same space for a different file; writes to one file now forever corrupt the other file
R/W
R/O
Not safe
e.g., host caches some writes for performance, and/or has updated the FAT without updating the directory entry (or vice versa ā¦ the changes are not transactional). Result is that firmware reads and/or writes corrupt data.
R/O
R/W
Not safe
e.g., host caches most data; firmware later makes changes; host does not see changes made by firmware; host reads and/or writes corrupted data
Iām not clear exactly what happens. I suspect this is part of the sequence:
Bus Pirate plugged in
Linux mounts /media/$USER/BUS_PIRATE5. Marks file system as dirty by writing to boot block
User connects to BP using serial link
Bus Pirate changes state. (Iām not sure what it does exactly)
User tried to copy file onto /media/$USER/BUS_PIRATE5. This does not complete.
Bus Pirate is unplugged.
Linux unmounts file system.
Bus Pirate is plugged in or reboots via serial interface.
Linux looks at boot block - perhaps seems the dirty bit. Refuses to mount file system.
However, if itās simply the dirty bit, then step 5 isnāt necessary to cause Linux to refuse to mount the file system. So ???
Perhaps a simpler way to cause the problem is to connect to the BP and type # or $. Iām looking into this. I also see that the udisks daemon is involved. I am monitoring itās status as I try to dig into the problem. Iāve never worked with udisksd before so I am learning new things here.
Iām getting deep in the weeds here. Iāve been trying to help diagnose the problem. Iāve had the case that if I connect to the BP, and type ā#ā, I have to reboot Linux to reconnect.
I donāt know if anyone else has this problem. I have a simple work-around Iām going to submit - a Linux script to connect to the BP. It simply remounts the file system read-only before connecting. Iāll share it in a separate posting.
This thread isnāt about the media change detection. Thatās a separate problem (see PR #106 by @phdussud). This thread is about whether the current decision to allow two initiators to the same FAT-formatted media (with one or two of them allowed to write to that media) should be revisited.
Itās my strong opinion that, even if the media change detection is fixed, there will be extremely difficult to debug / track corrupt data caused by the current choice.
To refresh, the current choice was:
Host: R/W, Firmware: R/W ā When the terminal is not open
Host: R/O, Firmware: R/W ā When the terminal is open
Both the above choices are unsafe. The first is less safe, the second still risks the host at least reading corrupt data.
The current choice is based on the fact that no firmware Write-IO happens while the terminal is not connected. At least thatās what I believe happens. Am I wrong?
You may be right. At the same time, there is nothing which prevents the firmware from writing. If such a restriction is being relied upon to keep data coherent, then it should be enforced. Otherwise, folks will break that restriction, have no idea they are doing so, and ā¦ DangerousPrototypes will have a new connotation. (If this is the chosen path, then the media should just flip-flop between the host or the firmware having full, exclusive access at any one time.)
For example, while I have not tested it, I believe itās possible to configure the button to run a script, and that no terminal connection is required. I can easily foresee this being a common setup (just press the button ā¦ and BAM! ā¦ a log, trace, or dump gets written to the NAND).
In fact, I intend to setup my BP6 in this manner once things stabilize, arming a logic analyzer (and trigger) with a button press, and having the results saved to file.
For the medium-term, I advocate for BP5 to practice āsafe storageā. This means avoiding the above unsafe multi-initiator states, and (at least at first) requiring some manual interaction.
As a strawman, a first step could be:
Define four allowed modes for the storage volume:
a. Exposed only to the firmware (R/W)
b. Exposed only to the host (R/W)
c. Exposed to both firmware and host (R/O for both)
d. Exposed to neither firmware nor host (No Access or Unformatted)
Default to one of those modes at boot, likely: firmware (R/W)
Support the PREVENT_ALLOW_MEDIUM_REMOVAL command, allowing the media to be locked by the host (preventing firmware from switching modes when it may be written by host).
Add a terminal command to switch the current mode of the storage volume.
The longer term solution is to implement MTP, and remove the mode where the host has R/W access to the storage volume. Then, any writes by the firmware would need to go through the media removal sequence to have the host invalidate its cached view of the FAT FS (and some MTP state). MTP is non-trivial, but was designed to resolve this exact type of issue.