MTP - Firmware Implementation Considerations

henrygab · May 4, 2025, 6:57pm

NOTE
This thread is NOT intended to suggest that MTP is (or is not), going to be implemented … either in buspirate firmware or elsewhere. Rather, it’s simply tracking things that need to be considered, if implementing MTP in a C11 based IoT-class device with built-in non-removable storage.

There have been a number of discussions around storage, where the idea of implementing MTP (Media Transfer Protocol) was raised.

This thread is here because I could not find posts that described what the difficult parts of implementing MTP would be. This thread focuses on the technical aspects of implementation on the firmware side, rather than host-side aspects (e.g., command line access restrictions, etc.).

henrygab · May 4, 2025, 7:09pm

This is all currently from memory, and I’ve not looked at MTP for quite a few months. Thus, some details may be incorrect.

Background on MTP:

MTP exposes blobs of data
- Each blob of data is identified by a unique ID
- That unique ID is more than a filename … it is tightly tied to the data and metadata of that blob. This appears to have been done to allow the host to easily cache data and metadata from MTP devices (artists, track names, picture thumbnails, album covers, DRM licenses, …)
- As a direct result of that tight tie-in, the unique ID must change if any of the file’s contents OR metadata change.
- As another result of that tight tie-in, the unique ID should be persistent across reboots of the device, and a unique ID must not be re-used … potentially ever.
MTP has event notifications used to notify the host of changes to one or more of those blobs. Thus, for a file system, any operation that changes the metadata exposed for an item must result in a notification to the host.
IIRC, MTP allows to query the unique ID of the parent of a given item. e.g., What’s the containing folder for this file?

henrygab · May 4, 2025, 7:14pm

Responsive to the question, “Is it a pain to implement MTP”.

I would say MTP is a significant amount of code to write, and an even more significant surface area to test.

In my experience, the cost of testing dwarfs the cost of development, and if that cost isn’t paid upfront, it is simply deferred to time when the code is already out and in-use by end-users. Thus, I consider testing capability (and costs) integral to the cost of the feature.

At a minimum, the device firmware would need to hook into the file system (and format commands), in order to ensure a new unique ID is generated for each file change, and to queue notification events for those file changes.

One significant pain point is the large amount of code that must be working, before any portion can be tested.

Then there are the less obvious pain points, such as all the edge cases that need to work properly, to avoid equivalents of the file system corruption issues (that will continue to plague the current multi-initiator storage device model). e.g.,

What happens when the database is only partially updated due to power loss?
How do you detect a file was updated when the database file hasn’t also been updated?
How to handle host-side interactions to a file or directory with an open handle in the firmware?
How to handle a failure to queue an event to notify the host of a change?
How to ensure a uniqueID is not re-used, especially where the volume is reformatted?

henrygab · May 4, 2025, 7:26pm

The prior posts mentioned the concept of the unique IDs, and how the respond data to any query about a given unique ID must be idempotent … even across reboots … for a given MTP volume.

This begs some important implementation questions:

How to generate those unique IDs
How to keep them idempotent across reboots
How to keep them in sync with the file system

The open source solutions I’ve seen keep an extra file on the media … essentially a database of filename ↔ uniqueID correlations, and an indication of a next uniqueID that is available for allocation.

If you think about this, that database is essentially a second full file system (metadata, tracking info), and has to be kept in perfect synchronization with the underlying file system.

As a result, unless the file system itself is transaction-safe or written to support idempotent content identification, there is no simple way to ensure that the database stays in sync with another file operation.

At best, the database would need to be updated to mark an existing entry as invalid, then permit the file handle be opened with write permissions, and when that file handle is closed, assign it a new unique ID.

Then, what happens in the interrim, when the host asks for data about that file or directory that is marked as being in the process of being updated?

henrygab · May 4, 2025, 7:37pm

All of the above is only considering a read-only host.

MTP does not allow editing an item in-place. Instead, the entire file must be transferred from the host with every edit that is saved.

The upside here is that it simplifies preventing data corruption when both host and device edit attempt to edit a file … in short, the device always determines who win.

If the device currently has an open file handle to that file, the device can reject the update.
If the device has updated the file, and the host was operating on an older revision of the file, the device gets to choose … should it reject the update, or allow the host to overwrite the changes the device made?

Moreover, since the device is the only thing modifying the FAT (and other file system metadata), there’s no concern about synchronizing those metadata (e.g., FAT, directory entry contents, etc.) between the firmware and host (which is a major concern with the current multi-initiator writable block storage).

So, that negative is the basis for some significant positive traits.

ian · May 4, 2025, 7:54pm

I don’t see MTP as an option unless someone with deep experience just drops it on us. It has happened on the past, but this seems like a deep cut (obscure).

henrygab · May 4, 2025, 7:58pm

Personally, I fully agree. Too large a code base, untested, and existing open-source solutions are in C++, tightly integrated with a very specific file system, etc. None of the open-source options are what I would call widely adopted.

In short, too much risk from a firmware side … and that’s not even considering the pain of not having a host-side file system API (e.g., access to a drive letter from cmd.exe in Windows).

Even if a solution was dropped, I would reject it without deep documentation showing its a conforming implementation, including edge cases.

ian · May 4, 2025, 9:09pm

Yeah the lack of docs for a dropped solution is what stalled a few of my early projects. But even with docs if I can’t maintain it then we’ll eventually have issues.

electronic_eel · May 6, 2025, 8:02am

Unfortunately I don’t have a working MTP implementation at hand.

But I just wanted to propose a solution for one of the issues mentioned:

The IDs must be kept unique for one volume.

I don’t see an important usecase why a host must be able to re-identify the image of the BusPirate as the same after rebooting the BusPirate. Yes, the host could do some additional caching. But not all host libraries really implement that and I don’t consider this a very important feature.

So the BusPirate could create a new random volume ID each time it is booted and this is just kept in RAM.

So now all the other IDs on the volume must just be unique within this one boot of the BP. I think this would simplify the ID housekeeping a lot as we now don’t need to deal with all the complicated incomplete write cases and so on.

AreYouLoco · May 6, 2025, 1:06pm

But isn’t this random ID going to create thousands of devices in Windows Device manager that are not present each time it reconnects?

electronic_eel · May 6, 2025, 6:52pm

I’m no Windows user or expert, so I don’t know for sure.

But I would consider the MTP volume ID very much like a USB memory stick and the FAT or exFAT volume ID on it. I hope not all USB memory sticks you ever plugged into a Windows system show up in the device manager?

AreYouLoco · May 6, 2025, 7:05pm

Well they do🤣 from the drop down menu when you tick show hidden devices they will show up. Not only active/enabled ones.

electronic_eel · May 6, 2025, 7:11pm

Ouch. Then this concept of Windows is completely broken as most systems will have a few hundred of USB memory sticks hidden in the device manager after a few years of usage. There is probably a real need for these registry cleaning tools when the whole concept is broken by design like this.

henrygab · May 6, 2025, 9:33pm

Well, if the USB device has a unique USB serial number, then it will have a unique device node, and thus appears when showing disconnected devices. (And as you know, this is entirely distinct from the serial number of the storage volume, which is a file system construct.) If you think about it, this is really the only way to have settings persist across device insertion / removal … e.g., COM port assignment.

If the USB device does not have a unique USB serial number, then there will be N nodes (N is the number of those that were simultaneously plugged in).

What concept is broken? Having a few kilobytes of data in what is essentially a database?

Installation of a never-seen device takes more time … searching for drivers from the INF files, running the registered co-installers, etc. Caching the results of a device’s installation makes the system more responsive … drive letters appear more rapidly the second and later time it’s plugged in, etc.

What negative effects have you experienced?

electronic_eel · May 6, 2025, 10:07pm

Registry size is an issue. I’d say this is one of the reasons why Windows systems tend to get slower and slower over time. When you then do a fresh reinstall of the same Windows version on the same hardware it becomes much faster again, because all of this kind of bloat is removed. This is not just because of the devices, but probably other bloat too that accumulates in the registry.

I don’t think it is a good idea to remember an assigned driver for one specific device with one serial number by default. It would be enough to be able to quickly assign and load the default driver for this VID+PID (in the usb case).

If you really want the system to be clever, you could additionally store individual serial numbers just in the case when you selected a different driver than the default one for this individual serial number. But I don’t see the need to do this for all devices this system has ever seen, because the default driver will be the correct one in the vast majority of cases.

henrygab · May 6, 2025, 10:45pm

I understand you disagree with the design decision, and I understand that you feel it’s unnecessary.

I understand you believe Windows gets slower because the registry remembers devices. Do you have data supporting this, or is it an (educated) guess?

ian · May 6, 2025, 10:48pm

Kindly request we let this one go.