I think I got the base OTP read and write functionality working. This includes all the different ways to read and write to/from OTP:
BYTE3X: Single byte 3x redundant in one row
RBIT3: 24-bits 3x redundant across three rows
RBIT8: 24-bits 8x redundant across eight rows
RAW: 24-bits in one row, no redundancy
ECC: 16-bits in one row, with ECC correction of 1-bit and detection of 2-bit errors
Those first three types (BYTE3X, RBIT3, RBIT8) are interesting, because they each vote on the result on a bit-by-bit basis. BYTE3X and RBIT3 require two of the three copies to have the bit set, while RBIT8 requires three of the eight copies to have the bit set. Lots of edge cases, especially when writing an updateā¦
Only remedy: Keep the code dirt-simple, giving up performance for readability.
BONUS
This is the first commit where I have working virtualization of the OTP. This allows me to stop destroying quite so many Pico2 boards. The exposed API allows to save portions of the virtual OTP state to a caller-supplied buffer, and to restore portions of the virtual OTP state from a caller-supplied buffer. This allows saving / restoring virtual OTP state to flash, for example. I canāt express how useful this should be while I finalize the OTP directory APIā¦
Note: This is the āsimpleā version, and takes 16k of RAM. For development, this is āgood enoughā. Not really intended for use outside developmentā¦
Other bits
All this investigation, and I still find edge cases that I need to verify the bootromās behavior for. (e.g., if a single BRBP bit is flipped, and the least significant bit is flipped, will the bootrom decode the data correctly? My code does, but I donāt want to report a successful write unless the bootrom would be able to decode itā¦)
Follow along / comment
Feel free to follow along or provide comments on the code, using this draft PR.
After your post I read a bit in the OTP-section of the datasheet and it sounds like the ECC is done fully in hardware and you can select if you want ECC or not by setting bit 14 of the address you read from.
Yes, there is a memory-mapped address. I could not find details on how this worked. (e.g., did it trap and run something in bootrom to read the data? is it in hardware? mixed? other?)
Where did you discover the details that indicated itās a hardware-implemented ECC decode?
Based on the behavior I have documented, using the ECC alias is dangerous ⦠it doesnāt return errors when the data is not actually ECC encoded. The same result when reading using the bootrom ⦠no errors when reading as ECC, but the data isnāt encoded as ECC.
This is one of the driving factors to the creation of the saferotp libraryā¦
Hmmm⦠Maybe Iāll make a PR to the bootrom, so it reads raw even for ECC requests, and then verifies the decoded data (when re-encoded for ECC) matches the raw dataā¦
Read the RP2350 datasheet, section 13.1 and especially 13.1.1. There they explain that the pure ECC read just tries to correct errors in a best-effort way but does not create any fault or similar if there was an uncorrectable error. You have to use the combination of ECC and guarded read to get a bus fault in case the ECC has found an uncorrectable error. This is all done purely in hardware.
They write that the bootrom uses guarded ECC reads to read the boot configuration data. Did you see the described behavior with non-ECC encoded data reads in the boot configuration data or just in āregularā reads?
There also is chapter 13.6.2 which describes how the ECC algorithm works and that it can fix single bit errors and detect multi-bit errors.
Oh, Iām very familiar with the ECC algorithm ⦠Iāve implemented it twice, and exhaustively searched the encoded space a few ways.
Aha! I missed that one line in 13.1.1 on guarded reads: Uncorrectable ECC errors return a bus fault if detected.
[edit] Note: This isnāt explicit about what occurs on non-guarded reads. I wish they were explicit that non-guarded reads will ignore uncorrectable ECC errors, and return the brbp-adjusted value.
I donāt see any current code for handling exceptions, which a bus fault is but one type of. While useful to use guarded reads for booting, where crashing is preferred, it seems inappropriate for general IoT-safe code, where an error return code seems more appropriate.
Should we consider preventing access to the guarded regions using the memory protection unit? (and maybe also the non-guarded ECC area?)
I will investigate and report findings on using the guarded reads for OTP rows that have ECC errorsā¦
If you want to handle a bus fault you need to install a special handler for it. And then figure out what the reason was - there are a ton of different possibilities and a ECC error wonāt stand out on itās own. It will be hard to distinguish a ECC error from an access denied due to some protection bits being set for example.
I guess they donāt have a bus fault handler, so youāll just get a hanging processor.
Yeah, it depends very much on what you want to use the OTP region for. If it is more for providing signed serial number and production lot information like for the BP then getting a softer failure info would be preferable.
They should just have added two bits to the OTP registers: one for corrected single bit error and the other for uncorrectable multi-bit error. Youād clear those bits, do your read and then check them again. If one is set you know that you had some kind of error.
But I guess Synopsis would have to add such a feature, I donāt think RasPi gets enough insight into the inner workings of the OTP. They probably just get a finished RTL-blob that they can place somewhere on the die.
Why? The BP doesnāt really need or uses the concept of code with different kinds of access privileges. The users are always expected to run their own firmware. And if it fails for some reason, they can upload a new one.
If they mess with OTP in a way that it prevents booting then it is not much different than someone injecting a too high voltage or otherwise destroying the unit. Iād say the commandline interface should print some scary warnings and ask for confirmation two times, but otherwise not prevent the user from writing to OTP as they desire.
Phase 0.5 provides a safer API for reading and writing data from OTP. It supports data encoded as ECC, RBIT3, RBIT8, or BYTE3. It also includes OTP virtualization (as a compile-time option).
Phase 0.7
Phase 0.7 will include self-tests for all the major data types. These self-tests will use virtualized OTP values that are specifically designed to explicitly test all the code paths, including where sectors fail to read reliably. (Otherwise, how to discover regressions?)
Phase 1.1
Phase 1.1 is where OTP Directory Entry APIs will be added. Currently, an unstable API exists and āshouldā work, but is not tested. API is unstable because it needs tweaking based on intended usage, to make it really easy to do otherwise complex stuff.
Release method
Library will be shared via SimpleHacks Github organization, and released under MIT license.
This will be compile-time library with CMakefile support. It is currently working well enough to compile for all supported BP platforms (BP5 Rev8 / Rev10, BP5XL, BP6).