OTP whitelabel options for RP2350 boards

henrygab · April 6, 2025, 5:15am

Good news.

I think I got the base OTP read and write functionality working. This includes all the different ways to read and write to/from OTP:

BYTE3X: Single byte 3x redundant in one row
RBIT3: 24-bits 3x redundant across three rows
RBIT8: 24-bits 8x redundant across eight rows
RAW: 24-bits in one row, no redundancy
ECC: 16-bits in one row, with ECC correction of 1-bit and detection of 2-bit errors

Those first three types (BYTE3X, RBIT3, RBIT8) are interesting, because they each vote on the result on a bit-by-bit basis. BYTE3X and RBIT3 require two of the three copies to have the bit set, while RBIT8 requires three of the eight copies to have the bit set. Lots of edge cases, especially when writing an update…

Only remedy: Keep the code dirt-simple, giving up performance for readability.

BONUS

This is the first commit where I have working virtualization of the OTP. This allows me to stop destroying quite so many Pico2 boards. The exposed API allows to save portions of the virtual OTP state to a caller-supplied buffer, and to restore portions of the virtual OTP state from a caller-supplied buffer. This allows saving / restoring virtual OTP state to flash, for example. I can’t express how useful this should be while I finalize the OTP directory API…

Note: This is the “simple” version, and takes 16k of RAM. For development, this is “good enough”. Not really intended for use outside development…

Other bits

All this investigation, and I still find edge cases that I need to verify the bootrom’s behavior for. (e.g., if a single BRBP bit is flipped, and the least significant bit is flipped, will the bootrom decode the data correctly? My code does, but I don’t want to report a successful write unless the bootrom would be able to decode it…)

Follow along / comment

Feel free to follow along or provide comments on the code, using this draft PR.

SAFEROTP library is under src/deps/saferotp.

electronic_eel · April 6, 2025, 9:10am

The bootrom code involved looks quite complicated to me:

https://github.com/raspberrypi/pico-bootrom-rp2350/blob/master/src/main/arm/varm_otp.c

Maybe you can find a way to hook parts of this code into your virtualized OTP?

henrygab · April 6, 2025, 6:27pm

Thanks for the pointer! The code has a lot of presumed understanding. For an average mortal trying to trace an ECC read, they will eventually get to:

github.com/raspberrypi/pico-bootrom-rp2350

src/main/arm/varm_otp.c

fd6104450


      
          if (is_ecc) {
              *(uint16_t *)buf = inline_s_otp_read_ecc(row);
          } else {
              *(uint32_t *)buf = current_val;
          }

But, this resolves to the following(!):

github.com/raspberrypi/pico-bootrom-rp2350

src/main/native/bootrom_otp.h

fd6104450


      
          // Read 16-bit ECC-protected value from OTP
          bootrom_otp_inline uint16_t inline_s_otp_read_ecc(uint row) {
              return otp_data[row & OTP_ROW_MASK];
          }

Which does not explain how ECC data would be corrected by the bootrom, and nothing in main bootrom assembly gives any hints there either.

Thus, with my level of understanding, I’m forced to test the edge cases against the actual shipped hardware.

Of course, I’d welcome pointers to how they decode an ECC-encoded OTP row (24 bits of data) into their resulting 16-bit value.

electronic_eel · April 6, 2025, 6:52pm

Thanks for dissecting the bootrom code.

After your post I read a bit in the OTP-section of the datasheet and it sounds like the ECC is done fully in hardware and you can select if you want ECC or not by setting bit 14 of the address you read from.

henrygab · April 7, 2025, 5:13am

Yes, there is a memory-mapped address. I could not find details on how this worked. (e.g., did it trap and run something in bootrom to read the data? is it in hardware? mixed? other?)

Where did you discover the details that indicated it’s a hardware-implemented ECC decode?

Based on the behavior I have documented, using the ECC alias is dangerous … it doesn’t return errors when the data is not actually ECC encoded. The same result when reading using the bootrom … no errors when reading as ECC, but the data isn’t encoded as ECC.

This is one of the driving factors to the creation of the saferotp library…

henrygab · April 7, 2025, 4:57pm

Hmmm… Maybe I’ll make a PR to the bootrom, so it reads raw even for ECC requests, and then verifies the decoded data (when re-encoded for ECC) matches the raw data…

electronic_eel · April 7, 2025, 5:58pm

Read the RP2350 datasheet, section 13.1 and especially 13.1.1. There they explain that the pure ECC read just tries to correct errors in a best-effort way but does not create any fault or similar if there was an uncorrectable error. You have to use the combination of ECC and guarded read to get a bus fault in case the ECC has found an uncorrectable error. This is all done purely in hardware.

They write that the bootrom uses guarded ECC reads to read the boot configuration data. Did you see the described behavior with non-ECC encoded data reads in the boot configuration data or just in “regular” reads?

There also is chapter 13.6.2 which describes how the ECC algorithm works and that it can fix single bit errors and detect multi-bit errors.

henrygab · April 7, 2025, 6:15pm

Oh, I’m very familiar with the ECC algorithm … I’ve implemented it twice, and exhaustively searched the encoded space a few ways.

Aha! I missed that one line in 13.1.1 on guarded reads:
Uncorrectable ECC errors return a bus fault if detected.

[edit] Note: This isn’t explicit about what occurs on non-guarded reads. I wish they were explicit that non-guarded reads will ignore uncorrectable ECC errors, and return the brbp-adjusted value.

I don’t see any current code for handling exceptions, which a bus fault is but one type of. While useful to use guarded reads for booting, where crashing is preferred, it seems inappropriate for general IoT-safe code, where an error return code seems more appropriate.

Should we consider preventing access to the guarded regions using the memory protection unit? (and maybe also the non-guarded ECC area?)

I will investigate and report findings on using the guarded reads for OTP rows that have ECC errors…

electronic_eel · April 7, 2025, 6:53pm

If you want to handle a bus fault you need to install a special handler for it. And then figure out what the reason was - there are a ton of different possibilities and a ECC error won’t stand out on it’s own. It will be hard to distinguish a ECC error from an access denied due to some protection bits being set for example.

I guess they don’t have a bus fault handler, so you’ll just get a hanging processor.

Yeah, it depends very much on what you want to use the OTP region for. If it is more for providing signed serial number and production lot information like for the BP then getting a softer failure info would be preferable.

They should just have added two bits to the OTP registers: one for corrected single bit error and the other for uncorrectable multi-bit error. You’d clear those bits, do your read and then check them again. If one is set you know that you had some kind of error.

But I guess Synopsis would have to add such a feature, I don’t think RasPi gets enough insight into the inner workings of the OTP. They probably just get a finished RTL-blob that they can place somewhere on the die.

Why? The BP doesn’t really need or uses the concept of code with different kinds of access privileges. The users are always expected to run their own firmware. And if it fails for some reason, they can upload a new one.

If they mess with OTP in a way that it prevents booting then it is not much different than someone injecting a too high voltage or otherwise destroying the unit. I’d say the commandline interface should print some scary warnings and ask for confirmation two times, but otherwise not prevent the user from writing to OTP as they desire.

ian · April 7, 2025, 8:12pm

/*** Moar characters ***/

henrygab · April 28, 2025, 3:34pm

OK, phase 0.5 is now close to complete.

Phase Definitions

Phase 0.5

Phase 0.5 provides a safer API for reading and writing data from OTP. It supports data encoded as ECC, RBIT3, RBIT8, or BYTE3. It also includes OTP virtualization (as a compile-time option).

Phase 0.7

Phase 0.7 will include self-tests for all the major data types. These self-tests will use virtualized OTP values that are specifically designed to explicitly test all the code paths, including where sectors fail to read reliably. (Otherwise, how to discover regressions?)

Phase 1.1

Phase 1.1 is where OTP Directory Entry APIs will be added. Currently, an unstable API exists and “should” work, but is not tested. API is unstable because it needs tweaking based on intended usage, to make it really easy to do otherwise complex stuff.

Release method

Library will be shared via SimpleHacks Github organization, and released under MIT license.

This will be compile-time library with CMakefile support. It is currently working well enough to compile for all supported BP platforms (BP5 Rev8 / Rev10, BP5XL, BP6).

ian · April 28, 2025, 5:57pm

Great idea to generalize it! A lot more projects can benefit.