Revert if needed, the work on version stuff is in flux after your suggestion of unified firmware. Whatever is easiest.
Iām still following my current plan.
current POR
- Cleanup enough to build all platforms again
- Byte-count instead of row count in
BP_OTPDIR
entries! - Reserve top
8
bits ofBP_OTPDIR
to identify classes of entries.- E.g.,
0x8000'0000
ā entry can be present multiple times - E.g.,
0x4000'0000
ā entry type is critical to further parsing (e.g., versioning increase for later records) - E.g.,
0x2000'0000
ā Embedded entry (no data, or trivial data <=16 bits stored directly within the directory entry).
- E.g.,
- Get optdir helper functions, including
BP_OTPDIR
related, working- Iterator functions based on FindFirst() / FindNext() mechanic, as agreed.
- Helper function: Find single/first
BP_OTPDIR
entry of given type - helper function: Write string +
BP_OTPDIR
entry.- For a defined string type identifiers
- Writes the string to requested start OTP row, ensuring null-terminated string as agreed
- If successfully written, also creates the
BP_OTPDIR
entry.
- Helper function: Find writable area, including variable criteria:
- OTP page aligned start
- OTP page aligned end
- Allow soft-locked pages (for debugging purposes)
- Option to disallow pages with single-bits already flipped ⦠those are normally OK for ECC writes, because BRBP will hide that single-bit error.
- Define entry types for:
- each USB whitelabel string
BP_OTPDIRENTRY
version == 1- ⦠etc. ā¦
- Modify the USB WhiteLabel code to do the following:
- Take as input, a structure with ALL possible input (strings, etc.)
- If any of those entries already exist in
BP_OTPDIR
, validate they are identical (or fail with error) - Validate ALL strings as meeting unique USB whitelabel requirement BEFORE doing any OTP operations ⦠string length offsets, strings will be in range, etc. ⦠essentially a map of where all the data will end up.
- Use
BP_OTPDIR
helper function to find page-start-aligned zone of sufficient size for whitelabel data (note: currently identical to the hardcoded 0xC0, but could be useful if an OTP area is unusable on a board).- Test by corrupting OTP rows 0x0C0ā¦0x0FF first
- Use
BP_OTPDIR
helper functions to write each of string entries + correspondingBP_OTPDIR
entries - Continue as previously doneā¦
- Write the 16-row whitelabel structure
- Write the whitelabel address
- Write the whitelabel
- Migrate the certificate stuff to use the above common OTP helper functions
That said ⦠Did I read correctly that you want to drop the manufacturing data string from USB Whitelabel, and instead make it easier to find the CPU serial number by injecting that instead?
If so, thatās a fairly easy thing for me to add to my list, and would actually simplify the USB Whitelabel process, maybe into 100% automated form.
Yes! That is the intent. Save as ASCII string. That way if the board wonāt load firmware we can get the serial and check our database. There is a photo somewhere but I know the threads have gotten crazy. I will post another example tomorrow.
White labeling can be done in one pass.
Something on my mind: triggering white labeling during init may be less than ideal. I believe it should be triggered by cert upload after self test during manufacturing. My main use case is I sent a firmware to test the certification boards to ensure we send you a blank OTP board, and I needed to double check it wouldnāt write the white labeling.
I can do the binmode no problem, just wanted to put the structural change out there.
Ok, wherever you want it ⦠itās easy to change when it happens. Thereās still lots of foundation, framing, plumbing, and electrical to lay out before we decide the room colors.
Iāve been reviewing the changes, and ⦠LOVE THEM!
The code intention is so much clearer now. Great example:
#if BP_REV == 8 || BP_REV == 9
vs.
#ifdef BP_HW_STORAGE_TFCARD
Looking at the second, I immediately understand when itās used, and can intuit reasons for it. Great positive impact!
And I also like that your case switches between revisions now has a default that results in compilation errors (e.g., #if A / #elif B #else #error
rather than #if A #else /*presume B*/
). Thatās good practice, and helps highlight areas that arenāt ready for a next revision.
This is how the Board-ID appears in the info_uf2 file.
/*pico_unique_board_id_t id;
pico_get_unique_board_id(&id);
//convert the unique ID to ANSI string
snprintf(buf, sizeof(buf), "%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X",
id.id[0], id.id[1], id.id[2], id.id[3], id.id[4], id.id[5], id.id[6], id.id[7]);
printf("Manufacturing ID string: %s\r\n", buf);
if(!bp_otp_apply_manufacturing_string(buf)){
printf("Failed to apply manufacturing string\r\n");
}else{
printf("Manufacturing string applied, OTP locked\r\n");
}*/
Here is some commented out code in cmd_otp.c that I used to burn the ID to otp.
Not impressed
When reading the memory-mapped OTP alias that claims to use ECC reads ⦠at least through OpenOCD ⦠there are zero errors reported for corrupted data.
This could be me not configuring something in OpenOCD correctly.
But itās not worth digging in, as I already wrote fully functional ECC correction routines.
When reading ECC-encoded OTP rows, I am strongly recommending against relying on (at least) the memory-mapped alias (starting at 0x40130000
). I am tentatively also recommending against using the bootrom to do the ECC correction.
In short:
- Writing ECC OTP ->bootrom
- Writing RAW OTP ->bootrom
- Reading RAW OTP ā bootrom or RAW memory alias
- Reading ECC OTP ā RAW read + software-based ECC correction
Pushed a stable set of changes
I sacraficed another RP2350 PICO board to verify the automated, 1-step whitelabel process works. I also ordered another batch of PICOs, so I can worry less about not having one when I need it. Thus, I pushed my branch (henrygab/otp_upstream_dev
) to your whitelabel branch (dangerousprototypes/otp_whitelabel
).
Note: I also updated the USB serial number creation, to have it more closely match the format you used above and as now used in the USB whitelabel data ⦠and also updated the RTT output to match. Thereās likely other places the serial number is converted to a string, but those were the initial spots I found.
Why do I call this out specifically?
Every board will show up as a ānewā instance after an upgrade. Some settings associated with specific COM ports might need to be recreated. udev
rules that are based on specific serial numbers might need to be updated also.
More changes incoming
Now to implement some of the OTPDIR functionality, move everything to the heavily-tested OTP functions.
Merged main into the current otp_whitelabel.
- Addresses the remaining places without feature based define switches
- Clears a nasty conflict with the old cert submodule version that made switching to otp_main a big pain
- Basically getting closer to merging into main
I also pushed a comment addressing why pin pull-downs are disabled in pio programs for RP2350 (E9), and changed the define to be descriptive. Who knows, maybe there will be a stepping without the bug.
The button is correct, also an E9 related issue design change. Comment added.
In the pull request you mention that otp is no longer soft locked. Should I look into keeping it all soft locked, and unlocking only the pages we intend to write at the time we write them?
I will start a manufacturing binmode for whitelabel and cert burning (and also automated self-test).
No⦠soft-locking is identical to locking the registers, except that it gets cleared after reset. This is useful if the bootloader needs access to some of the OTP, but you want to further restrict it after initial use (e.g., read-only for unique device secret, then no-access when jumping to main firmware). In our case, I soft-locked the registers to prevent accidental writes, when the whitelabel was going to occur automatically early in boot. If we want to allow writing the OTP ⦠then cannot softlock. (btw, itās a single function call ⦠and the function is left in.)
Thanks for documenting the PIO program questions I added!
src/boards/memmap_default_rp2350.ld
lists PSRAM ⦠should that line be removed and only exist in .../memmap_psram_rp2350.ld
?
The OTPDIR stuff isnāt ready. I have lots of pending changes (that now need to be merged with this), as I am in the midst of coding the OTPDIR stuff. Hoping to get some of this ready this weekendā¦
That was probably a mistake, but Iām.not sure it has a functional impact at this time. I will have a look to see what happened.
I think we should treat the current otp_whitelabel
branches as a scratch branches. Then, after the OTPDIR stuff is at a workable state, I will generate a fresh branch from main
, with changes that are more targeted and easier to review. (and easier to follow in git history.)
Current state of affairs is now reflected by following PR (which is both to/from my own fork, and should never be merged ⦠itās only for those curiousā¦)
Found a nasty bug thatād corrupt the OTP. Still lots more testing to do.
The OTPDIR functionality has a framework which will likely work well, but itās entirely untested code. On the plus side, it has a ton of RTT debug output. Externally-facing API is next up (and summarized in the above PR).
Looks solid.
bool bp_otpdir_add_ecc_entry(BP_OTPDIR_ENTRY_TYPE entryType, uint16_t start_row, uint16_t row_count, size_t valid_byte_count)
row count is half byte count, if odd + one?
As posted, yes.
That oneās in flux, and at least going to be renamed slightly, to indicate itās adding an entry for already-written OTP data. Of course, a helper that takes as input the type & the ASCII string (finding a spot for the string, writing it, and adding it to the directory automagically) is in the plan. After all, have to use it a few times just in whitelabel, so may as well make it available as a helper function.
Whatever the API ends up as, the goal is that code using the API will flow smoothly, even if the API internals have to get more complex. If there appears to be any undefined situation (like you noted above), then I either didnāt document it well enough, or the API may need to change (e.g., split into multiple APIs, or change parameter types / counts, or ā¦)
After a brief hiatus, I am able to dig into this again. I apologize for the delay. Hope to have my dev environment verified tonight, and then to make progress tomorrow.
No need to apologize, itās all for fun
Iāve decide development really needs to not be constrained by the one-time nature of writing to the actual OTP fuses.
So, Iām writing a shim for virtualized OTP.
I plan to allow that virtualized OTP to be persisted in the NAND.
Normally, would want the OTP to be available as early as possible during boot.
Will have to review the order of what gets initialized when. While this is fine for initial development, donāt want the OTP values to massively change after a component has read something from the OTP directory. Still, for initial development, itās worth not cycling through expensive (and rare) bus pirate 6 unitsā¦
Comments on method to be used are pushed to my upstream_otp_dev
branch, as linked above.
Yikes!
What I originally thought
I had originally discovered RAW 24-bit OTP values that, if blindly going through the ECC correction algorithm, would NOT be detected as having errors. Hereās just one example:
Foo | Bar | XOR | bitflip count |
---|---|---|---|
16-bit Value | 0xb98b |
n/a | n/a |
24-bit ECC | 0x25b98b |
0x000000 |
0 |
Bad Raw 0 | 0x12b98b |
0x370000 |
5 |
Bad Raw 1 | 0x18b98b |
0x3d0000 |
5 |
Bad Raw 2 | 0x1bb98b |
0x3e0000 |
5 |
Bad Raw 3 | 0x1db98b |
0x380000 |
3 |
Bad Raw 4 | 0x1eb98b |
0x3b0000 |
5 |
Bad Raw 5 | 0x33b98b |
0x160000 |
3 |
Bad Raw 6 | 0x39b98b |
0x1c0000 |
3 |
Bad Raw 7 | 0x3ab98b |
0x1f0000 |
5 |
Bad Raw 8 | 0x3cb98b |
0x190000 |
3 |
Bad Raw 9 | 0x3fb98b |
0x1a0000 |
3 |
āSo what?ā, you might reasonably ask. Those bit patterns happen to have no corrections on the low 16-bits of data. Why not just return the low 16-bits of data?
The issue is that there is likely another raw value with a lower hamming distance that represents a valid ECC encoding. In other words, the āmost likelyā data (fewest bit flips) to be correctly ECC encoded data may not be 0xb98b
. Thus, returning 0xb98b
as the ECC-corrected data is ⦠simply incorrect. This should report an error when attempting to read the value as ECC-corrected.
As a result, I previously thought these rare few values would cause the bootrom to return invalidly-corrected data.
The OTP behavior of the RP2350 is worse than I imagined!
Letās presume that you wrote those ten values to OTP rows 200 ⦠209 (using RAW mode).
The bootrom, when reading data in āECC correctedā mode, reads two OTP rows at a time into a single uint32_t
. Since none of those ten rows are valid ECC-encoded data, one might expect the bootrom to report an error when reading them in āECC correctedā mode.
But no ⦠the bootrom will just ignore most errors, and simply give back the low 16-bits of each row. It does NOT have to be one of the phantom
values, either!
What about the memory-mapped section at OTP_DATA_BASE
:
It clearly indicates that it returns, āECC-correctedā data.
Like the bootrom, the memory-mapped area, if it fails to detect a correctable 1-bit error, simply returns the least significant 16 bits. It does NOT have to be one of the phantom
values.
The only slight positive is that, when the row was written using Bit Recovery By Polarity, at least the value returned reflects the inverted bits (vs. the raw ⦠thus closer to what was written). While BRBP is applied, many errors are simply ignored.
Test method details
First, find values that this could occur with.
- Allocate a bitmap with 2^24 elements.
- For all 16-bit original values:
- create the 24-bit encoded value (
raw
). - verify its bit is not set yet, and set the bit in the bitmap
- create the 24-bit inverted value (
raw_brbp
) - verify its bit is not set yet, and set the bit in the bitmap
- create the 24-bit encoded value (
- For all 16-bit original values:
- create the 24-bit encoded value
- create the 24-bit inverted value
- for each of the 24 single-bits,
mask
:- verify clear, and then set,
raw ^ mask
in the bitmap - verify clear, and then set,
raw_brbp ^ mask
ā¦
- verify clear, and then set,
- For all 24-bit values, where the bit is clear in the bitmapā¦
- Attempt to correct the data using ECC correction
- If this does not detect any errors ⦠it must be an odd-count multi-bit error, because all even-bit errors are caught by the parity bit, and all valid encodings + 1-bit errors are set in the bitmap.
- Flag this as a
phantom
raw encoding
Statistics on phantom
encodings:
- 1,310,720 (0x140000) phantom decodes detected,
~= 8%
of all possible values. - These are equally distributed with 3, 5, 19, or 21 bitflips
- Note that 19 bitflips is just the 5 bitflip value, with BRBP applied
- Similiarly, 21 bitflips is just the 3 bitflip value, with BRBP applied
TLDR; Donāt trust the bootrom when reading ECC encoded data from OTP rows. It will hide data errors.
Iāve got a āsaferotpā library in the works, that detects 100% of the invalid ECC-encoded rows. That library also fully handles the RBIT3
, RBIT8
, and BYTE3X
encodings used elsewhere, including all the bit-by-bit voting mechanisms.
But ā¦
I donāt understand how all the ECC calculations work, but that does seems really bad.