@henrygab I would like to come back to the discussion you started on this thread but not finished, explaining scenarios why this situation:
firmware RW and host RO, can lead to corrupted state on the host.
and the situation:
Firmware RO and host RW can lead to corrupted state on the firmware.
I am super interested in the answer.
Here is what I think I know. Please correct me if I am wrong:
Dhara is the common access point to the storage, the NAND itself is abstracted underneath the Dhara API.
Dhara itself has a cache with delayed write so durability may be compromised if Dhara cache isnāt flushed before a power down event. Our hardware do not have the capability of mitigating this threat.
Dhara is now protected by a mutex that insures that concurrent reads and writes are serialized to protect from torn reads and writes.
I am asserting that Dhara has the following properties.
Repeatable reads of the same sector if no intervening writes.
A sector read after a sector write will always return what was written.
On the firmware side, next next layer up is Fatfs.
FatFs as configured has only one cache of 1 sector. This cache, when dirty, is written back to Dhara in the following situations:
Before it needs to cache a different sector.
At the end of file close() when the file was opened in write access mode
At the end of a directory creation, deletion
At the end of SetVolumeLabel
At the end of a file rename (not supported in the firmware)
It should be noted that Dhara cache is flushed when the Fatfs cache is flushed, except in the first situation.
In all but the first situations the Firmware attempts to āresetā the view of the host by ejecting and re-inserting the media. This is done with a sequence of 3 SCSI media sense codes.
I have tried to think of a scenario that can lead to a corruption but I canāt.
Ok. Let me see if I can try. This would be easier with whiteboard and live interaction.
If notes say \foo ā 20ā¦23, 40, this means that the file foo in the root directory has a first allocated cluster of 20, and the full cluster chain is 20->21->22->23->40->End-of-chain.
Example with firmware RW, host RO
Host presumes that sector X, once read, will continue to give the same data for future reads, unless/until the host itself causes the data to change. Read cache is enabled even for hot-plug devices.
time
action
fw view
host view
Notes
0
mount
consistent
consistent
Two files, \foo, with one allocated cluster @ cluster 20, \log with various allocated clusters
100
host read
Host reads and caches FAT and root directory contents
200
fw deletes file \foo
consistent
corrupt
Host is not notified of the deleted file
300
fw extends file \log
consistent
corrupt
FW allocates clusters 20ā¦25
350
fw creates file \foo
consistent
corrupt
FW allocates cluster 90
400
host reads file \foo
consistent
corrupt
Host uses cached copy of FAT and root directory, discoveres file uses cluster 20, and reads cluster 20 (data from the middle of the log file)
500
fw reads file \foo
consistent
corrupt
FW determines file is at cluster 90 and reads correct data
The read-cache on the host (which is perfectly legal and valid to have) results in a mismatch between what the host sees vs. what the firmware sees.
Example with firmware RO, host RW
time
action
fw view
host view
Notes
0
mount
consistent
consistent
\foo ā 20ā¦25 and \log ā 40ā¦44
100
fw opens \foo
consistent
consistent
200
fw reads up to cluster 24
consistent
consistent
300
host truncates \foo
corrupt
consistent
\foo ā 20ā¦23
400
host adds sectors to \log
corrupt
consistent
\log ā 40ā¦44, 24ā¦27
450
optionally, host expands \foo
corrupt
consistent
\foo ā 20ā¦23, 28ā¦31
500
rw reads more data from \foo
corrupt
consistent
FW now gets data from middle of \log, not from \foo
The firmware has an open handle to a file. There is no way to indicate this to the host. The host properly does what it wants with the media. We do NOT want the firmware to have to deal with file handles that suddenly become invalid after they are opened (complexity like that would geometrically explode test cases). When firmware looks at the FAT to discover the next cluster, it looks at the FAT for cluster 24 (where itās read to). There is a valid entry there, pointing to cluster 25, so that the next cluster of the file ā¦ except that the host has changed which file was using cluster 24, and thus the next cluster in the FAT is also for some other file.
This same problem would occur with a full SATA storage that manages its own bad block remapping ā¦ in other words, dhara does not appear to be relevant to this discussion, which is viewing the logically exposed space (above dhara).
@henrygab. Thank you for taking the time to think of these scenarios
About the Firmware RW, host RO:
200: As soon as the file is deleted, the Host is notified of the removal of the medium followed by a medium change and insertion of the new medium. This makes the host aware of the deletion.
300: if log is closed, the same thing as step 2 happens.
350: same as 300
400: since the host view has been reset in 350, it should be in sync with the FW view.
This is a great discussion, thank you everyone for contributing.
It has gone beyond my ability to keep up at the moment, but if there is anything I can do please let me know.
@phdussudās two pull requests related to storage issues are in the latest auto build firmware. Can anyone confirm if this has triaged any of the issues under Linux?
About the firmware RO, host RW
Right, good point. I understand that concurrent access to files will lead to trouble.
Either too little will be read by the firmware when a file is in the process of growing (by the host writing to it), or garbage if the firmware starts reading a file before this file is truncated by the host while the firmware is still reading it.
Both of these situations are undesirable.
Note that at the moment, I donāt think the firmware reads anything while the host can write but you appropriately noted earlier that it isnāt enforced and could change with the next contributor checkin.
Host reads and caches FAT and root directory contents
200
host open \q.txt
logically only, no data read yet
300
fw opens \q.txt
logically only, no data read yet
400
host reads @ 20, 21
500
fw writes @ 21ā¦28
(will soon continue to writeā¦)
700
host reads @ 22
corrupt
sector 21 = old , 22 = new (never saw new sector 21 data)
800
ā¦
ā¦time passesā¦
900
fw closes \q.txt
ā¦ too late ā¦
Example #3 with firmware RW, host RO
time
action
fw view
host view
Notes
0
mount
consistent
consistent
\q.txt ā 20ā¦29
100
host read
Host reads and caches FAT and root directory contents
200
host open \q.txt
logically only, no data read yet
300
fw opens \q.txt
logically only, no data read yet
400
host reads @ 20
500
host reads @ 21
600
fw writes @ 21ā¦28
(will soon continue to writeā¦)
700
host reads @ 21
corrupt
undefined behavior
Because the device reported different data for the read of cluster 21 at step 500 vs. step 700, the device is violating the basic contract of a read-only media: It must return identical contents for reads of same sector.
The immutability of data from other sources is a basic premise for
local mounted storage. You break that behavioral contract, and you get undefined behavior / hard to track bugs in the host.
How do I know about the host's UB?
There were lots of really odd bugchecks that only occured on machines with certain model hard drives. In other words, LOTS of undefined behavior. I helped build a filter driver to detect when drives reported success for a write, and then later reported different data for that sector. We were then able to bring this data to the drive manufacturers, helping them fix some edge case bugs in their firmware. Consumer drives became measurably more reliable as a result.
Even where dhara has no race conditions, the underlying cause is still having multiple initiators, where one or more can modify the underlying volume.
I agree Dhara isnāt interesting for this analysis. The root cause is concurrent access to the same file by 2 initiators with at least one writing to the file.
Concurrent access to a file is not the only scenario (e.g., writing additional data that extends file size is not atomic: updating FAT, writing new data, updating directory entry to reflect new size, etc.). I give simple examples, but this does not mean those are the only examples.
FYI, we're not the first to try...
I used to āownā part of the storage stack at Microsoft, including cdrom.sys and disk.sys. Iāve seen tons of attempts to āsolveā this over the years ā¦ including devices āsniffingā all writes, and attempting to āinterpretā what a given write to a FAT volume was intended to do. None of those attempted solutions survived.
MTP was created for precisely this reason. It allows the firmware to āownā the storage device, while allowing the host also be able to both read and write files on the device. MTP is still complex ā¦ yet it took off like wildfire in consumer electronics. That by itself should be telling.
All my examples have stayed on the simplistic. I did not delve into the non-atomic updates that FAT requires.
There is one solution that (theoretically) could work to enable FW R/W, Host R/O. However, there are no open source implementations, and itās rare to see such volumes in the āwildā ā¦ the use of Transactional FAT (or Transactional exFAT). Thereās enough information in the public patent documents to figure out how it works, but ā¦ itād still be a major undertaking to test all the edge cases ā¦ and Iām not taking on something that large currently.
One special case exists ...
There is one implementation out there, and itās a special case.
Adafruitās CircuitPython exposes the storage volume (which is also R/W for the Python scripts running) to the host as R/W. However, as soon as a write occurs, they stop execution, and after a couple seconds of not receiving additional writes, they reboot the device.
This mechanism was considered for BP5, but rejected as reboots when the host writes was seen as problematic (and rightly so for the BP5ās purposes).
Where there are two writing initiators, there are thousands of interactions which are unsafe. I think we agree on this?
Iām going to ask the dumb guy question. Please forgive me Henry. But Iām struggling comprehending the issue.
Yes, if both the host and BP were RW - that would be perfect.
If the host is RO and the BP is RW, and I read a file on the host while writing the same file on the BP, I would expect that reading the file would be unstable (for lack of a better word). Iāve used distributed file systems, and this is acceptable. I can live with this caveat until a perfect solution is implemented (using MTP?)
Normally the same person is doing both operations, so we are aware the file is unstable. Yes, if I am doing a ātail -f logfileā this will be screwy, and thatās a problem. But what files will the BP create?
SPI flashdump - Iād wait until the end before Iād assume itās a good dump.
Config file - No big problem. I can only type in one terminal at a time.
A sniff/trace/log file - This seems to be the only possible issue.
What other real-world scenarios do we have to deal with?
@ian@grymoire
I think you are asking the right question. What are the scenarios that users want to enable? What would be the best user model for accomplishing then?
I think it is time to think from top down (user->system) because we have a good understanding of the low level situation and possible solutions we donāt a proposal for a decent user interface.
@henrygab
I agree that it is a minefield. I donāt know how to count the number of mines
Incidently I found a way to make 1 initator work in a fairly similar way the current FW RW, host RO work.
Letās call it Firmwate RO/RW, host RO/Off
Instead of letting the FW file write operations happen while the medium is still host mounted, then eject/insert the medium, we can eject the medium at the beginning of the file function that performs writes and insert the medium back at the end. It is truly a one initiator scheme with transparent transition from RO->RW on the firmware side and RO->ejected on the host side. For short file operations, It would be the same experience as now.
I donāt have a clue about how to make the other scenario (host in control) palatable to the user. I donāt think I have a good idea about the use cases that need to be supported.
Yes, ejection of the host media when the BP5 firmwareā¦
Opens a file handle with write permission; -OR-
At first write operation
The ejection of the media must persist untilā¦
3. the last handle with write permission is closed; -OR-
4. the last file handle is closed
Options (1,3) are only usable when the permissions associated with the open file are known at open / tracked for lifetime of handle.
The options can be intermixed. A solution with option (2 & 3) would be the least negative. At a minimum, a solution with option (2 & 4) seems possible with the current firmware.
This state is where the host has full control of the volume, but firmware cannot access the volume. This would be an explicit mode switch (e.g., command via terminal). Ian has suggested this could also be based on button during power-up (in case terminal not easily accessible, I think).
The command to transition to host-only access should:
FAIL if any open handles by the firmware, and print corresponding message as to why it was rejected
A manual command to transition back to firmware-only mode could also exist (or simply require reboot of the BP5 device).
Can you confirm agreement with the command-based (or power-on w/button pressed) transition to Host-only mode?
I would be glad to do so (I have started on ff.c probably 90% done) but I wonāt be near a computer for the next 4 weeks or so.
There are several places in ff.c mkdir, setlabel, f_open, f_close, f_unlink, f_rename.
If you wish, I can commit to my personal github repo, what I have done so far for reference.
If I was using the BP and wanted to write a file onto it from the host, currently I would type ā#ā. The BP resets, my serial connection disconnects and the file system remounts RW on the host. I copy the file over and reconnect. Iām personally fine with this behavior.
Perhaps the only downside is when I have a specific setup on the BP that would be difficult to recreate. If that was the case, Iād create a file on the host that contains the commands I need to recreate the current state. Normally itās just a few commands for me. But I could see that it might be complex in some cases. Others should speak up.
Crazy idea time - perhaps the BP could either have the shell equivalent of a history mechanism that logs the commands to a file. Or perhaps there was a command to āsave stateā when exiting, and ārestore stateā when restarting.
Finally there could be a way to - on the host - capture the characters on the BP terminal and store them in a file. And then save them and copy them over and store them in a start-up file. I could write a shell script that did that - I think.
The storage is mounted read/write. I could manipulate the filesystem as expected (read, write, append, delete files, make and delete directories)
Open serial IO of BP in minicom
storage is auto remounted read-only on host
filesystem changes previously made from host are reflected in the serial terminal
one caveat ā if I open make the serial connection too soon after manipulating the storage from the host, those changes may be lost (the cached writes didnāt finish before the BP yanked the storage from the host). This isnāt a showstopper for me, just need a little patience :
[280971.108455] sd 1:0:0:0: [sdb] tag#0 access beyond end of device
[280971.108468] I/O error, dev sdb, sector 260 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[280971.108474] Buffer I/O error on dev sdb1, logical block 2, lost async page write
[280971.108497] sd 1:0:0:0: [sdb] tag#0 access beyond end of device
[280971.108500] I/O error, dev sdb, sector 480 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[280971.108503] Buffer I/O error on dev sdb1, logical block 57, lost async page write
[280971.108528] sd 1:0:0:0: [sdb] tag#0 access beyond end of device
[280971.108530] I/O error, dev sdb, sector 16928 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0
[280971.108533] Buffer I/O error on dev sdb1, logical block 4169, lost async page write
[280971.108536] Buffer I/O error on dev sdb1, logical block 4170, lost async page write
[280971.108538] Buffer I/O error on dev sdb1, logical block 4171, lost async page write
[280971.108541] Buffer I/O error on dev sdb1, logical block 4172, lost async page write
close serial I/O terminal in minicom
storage is still read-only on the host. Even if I manually unmount/remount from the cmdline it is still read/write. Seems like the only way to get it back to read/write on the host is to reset the BP. Again, not a showstopper
All in all, I would be happy with this functionality. It is predictable and logical. I donāt write to the storage from the host very often; just updated binary to write to flash is the only thing right now.
Personally, I donāt run into this very often. If I do have something like that, Iāve probably already documented it and can just copy/paste the commands back in. Usually a time or two of that and I begin to remember it anyway.
Thanks again for all the work on this; I know there are no simple solutions for this. @henrygab@grymoire