I2c mode locks up BP on build ci-buspirate5-main-03d4066.zip

HEDoRAH · April 24, 2024, 7:04pm

Flashed newest build and was trying to tinker with an SGP30 VOC sensor, and choosing mode 5 (i2c) asks for freq, and then locks up no matter what you enter.
I tested previous build and it works fine. Since nothing was changed in the i2c code, maybe its the build server?
I am kind of a noob so cant speculate too much, but i did try a couple of the previous builds and i2c mode works as expected.
I have either the best or worst luck, depending how you look at it. this was literally the first thing i tried after getting my replacement from the mailbox a few minutes ago LOL

PS: when i reflashed the newest build to re-test, now it locks up before asking for freq. am i doing something dumb?

also to note. this happens before any wires are hooked to the BP. i did have wires hooked up the first time around, but realized i didnt know which IO BP would want for SCL/SDA so unhooked them completely and same thing happens

phdussud · April 24, 2024, 11:28pm

@ian
That’s because the stack guard has been turned on. I am not sure entirely what’s going on, but turning the stack guard on also requires specifying the value of PICO_STACK_SIZE. Like so in CMakeLists.txt:
target_compile_definitions(${revision} PUBLIC PICO_USE_STACK_GUARDS=1)
target_compile_definitions(${revision} PUBLIC PICO_STACK_SIZE=4096)

phdussud · April 24, 2024, 11:44pm

Sorry. The default stack size is 2KiB which is too small to accommodate for the FAT filesystem code which allocates more than this on the stack during the execution of f_open. Setting the stack size to 4Kib makes things better.

wyattearp · April 25, 2024, 2:11am

Confirmed this happens for everything related to writing to the fat system. Items such as UART config or general config (c) also cause the same defect to rise.

ian · April 25, 2024, 9:48am

I’m sorry everyone, this is the second time in a week I pushed stuff to the wrong branch. It’s reverted.

phdussud · April 25, 2024, 3:33pm

I think it was a great insight. I bet the core0 stack overflow is the cause of all of the unexplained crashes so far.
I think you should post a debug binary with stack guard and 4Kib of stack space for people to try and validate the new stack size. People who use a debugger will be able to send stack traces.

ian · April 25, 2024, 5:28pm

Thank you for confirmation. I suspect there’s something going on there, but not ready to push to main.

Maybe I should start a “testing” firmware build thread with a different branch for testing.

phdussud · April 25, 2024, 5:36pm

That would be excellent. I debugged the crashes I got with stack guard and any execution of f_open is guaranteed to trigger the stack guard exception. The code pushes on the stack a large structure that is larger than a sector size (2048). The problem is how to pick a value for the stack size. 4096 seems like a good value but likely to be conservatively large.

ian · April 25, 2024, 5:54pm

That’s a sector of the NAND flash, plus a 512 byte buffer I’d bet. I’ll look to see why that’s on the stack, it’s probably my fault - FatFS is really tuned to avoid that kind of thing.

ian · April 25, 2024, 6:02pm

I’m guessing I need a pool of already allocated file structures that can allow temporary ownership by various functions. I have not looked internally lately, but my recollection was FatFS has a dedicated sector buffer shared by all files. It sounds like that may not be the case, or maybe I don’t have that option enabled (minimal implementation is an option).

phdussud · April 25, 2024, 6:24pm

Ah. It is the file argument.
Code like this, fil is the culprit:

void disk_hex_handler(struct command_result *res){
//check help
if(ui_help_show(res->help_flag,hex_usage,count_of(hex_usage), &hex_options[0],count_of(hex_options) )) return;
FIL fil;        /* File object needed for each open file */
FRESULT fr;     /* FatFs return code */

phdussud · April 25, 2024, 6:29pm

Right. This is because FF_FS_TINY is turned off

#define FF_FS_TINY 0

/* This option switches tiny buffer configuration. (0:Normal or 1:Tiny)

/ At the tiny configuration, size of file object (FIL) is shrinked FF_MAX_SS bytes.

/ Instead of private sector buffer eliminated from the file object, common sector

/ buffer in the filesystem object (FATFS) is used for the file data transfer. */

lersi · April 29, 2024, 4:51pm

That’s a great idea, I would like to be one of the tester.
In addition, a long term solution could be adding some unit tests to certain modules, so some code quality could be guarantied automatically.
I have experience with Google’s gmock, we can easly integrate it with our existing Cmake build system.

ian · April 29, 2024, 7:00pm

I will get this going tomorrow. Also the new contrib branch.

ian · April 30, 2024, 12:52pm

File system and stack overflow updates are pushed to the test branch. It solved a bunch of issues for me.

The enabled the tiny FatFS implementation for now, and the speed doesn’t seem to be an issue.

Now that this seems to be a bit more under control, should I investigate enabling the STACK_GUARDS? It seems like the reason the hard fault happened on core 1, though it was caused by core 0, is the core0 stack overflowing up into core1 stack and causing the crash. Maybe this is a good thing to enable.

Someone linked a nice library to catch and display the stack trace on faults, but it’s not licensed in a way we can use it. I imagine I can catch the fault interrupt and do something, but I’m not sure what - maybe change LED color? The USB pipeline would be broken after the fault, I assume, so displaying anything in the terminal would be tricky.

phdussud · April 30, 2024, 5:56pm

I think stack guard should be enabled on test builds, and I suggest that test builds are configured as debug instead of release. This would enable people to include a stack trace using gdb.
Personally, I have made the following addition to CMakeList.txt

    # useful for debug builds
    target_compile_definitions(${revision} PUBLIC PICO_USE_STACK_GUARDS=$<CONFIG:DEBUG>)

In other words when I configure cmake for debug builds it automatically enable stack guard

ian · May 1, 2024, 9:25am

Thank you. I’ve added the stack guard for debug and set a 4K stack size (though maybe that is unnecessarily large now).

The test branch should be compiled with stack guard and debug now. I also added the check for write flag.

phdussud · May 1, 2024, 3:49pm

@ian Awesome.
Another topic. If I understand correctly, people who want to submit a PR should do so on the test branch. I also understand that the contrib branch may not be necessary. If my understanding is correct, could you post somewhere an authoritative guidance about this? Thanks a lot!

ian · May 2, 2024, 8:26am

The main goal is to avoid pull requests on main because when I accept it, we get an autobuild before testing. Some sort of staging area away from main seems like the answer.

Maybe there is a “github” way to accomplish that. Thus far I’ve been going to the source of the pull requests and grabbing a patch, or creating my own pull request onto a new branch. I don’t know if anyone really cares, but it feels a bit rude not accepting a pull request and making my own.

I saw you push to test, that is totally fine with me. I’ll merge all that to main today.

masterX244 · May 2, 2024, 11:27am

thats called the “git flow” and the branch that you call “test” right now is called “development” there and PRs end up there. merges to main are only from “development” when it is stable for a release. main branch is always a stable state that way.