Devlog: Blogging Tools - Chunking File Uploader
I’m hoping to use Micro.blog more for video hosting, meaning that I would like a fast way to upload them. Because my internet connection is not fast, I’m looking at adding some fancy uploading techniques to Blogging Tools. Blogging Tools does have a simple file repository, and I’ve already built a reusable upload component, which uploads the contents of a file with a single HTTP request. My goal is to improve the performance of this by using a few of the techniques found on this dev.to page, such as chunking and concurrency.
But first, I start with some baseline tests. I will simulate a slow upload by throttling the browser speeds at “Fast 4G”, and test the time it takes to upload a 45.4 MB file.
Test done. It took 269.174 seconds, about 4.49 minutes.
So my first goal is add some chunking and concurrency uploads, similar to what was suggested in the linked article. The way I’m thinking of doing this is as follows:
- Modify the “new upload” request handler to accept a maximum file size, a chunk size, and the expected number of chunks. The server will respond by creating a empty file of that size, ready to be written to.
- I’ll then make multiple fetches requests, each uploading a single chunk, identified by the chunk’s starting offset.
- Then, once all chunks are uploaded, I’ll send a finalise request with a file hash. The server will also calculate the hash and say whether or not the upload was successful.
I’ll ignore the progress indicator for now, but that will need to be updated to reflect the actual upload progress.
First pass is to simply make sure the chunking works and doesn’t corrupt the uploaded file. I’ll begin on the server side by creating an empty file of size N. It was difficult to find how to do this in Go, but it turns out the os.Truncate method can be used to set the file size, once the file has been touched:
// This creates the file and sets it's length
fileName := cfs.fsProvider.FilePath(file.SymID)
if f, err := os.Create(fileName); err != nil {
return err
} else {
f.Close()
}
if err := os.Truncate(fileName, file.Size); err != nil {
return fault.Wrap(err)
}
f, err := cfs.fsProvider.OpenFileFlags(ctx, file.SymID, os.O_RDWR)
if err != nil {
return fault.Wrap(err)
}
cfs.fileHandles[file.ID] = uploadSessions{
mutex: sync.Mutex{},
fileHandle: f,
size: file.Size,
startTime: time.Now(),
}
return nil
The rest of the Go code is pretty straightforward: requests to upload a file chunk will simply be passed down to the file handle and written at the requested position using WriteAt. The finalise call will seek to the start of the file, compute the SHA-1, and compare it with the SHA-1 computed by the browser. MD5 would probably have been my preferred hash because of it’s speed, but SHA-1 seems to be the one supported natively in browsers. The open files are kept in an in-memory “upload session” data structure. Finalising will also close the file handle and remove the session, but I do need to add a “session reaper” to deal with any sessions that died or terminated early.
Here’s the first cut of the JavaScript which does the chunking upload:
let fileMetadata = await (await fetch(`${window.bgtData.resPrefix}/files/new`, {
method: "POST",
headers: {
"Content-type": "application/json"
},
body: JSON.stringify({
"name": file.name,
"mime_type": file.mimeType,
"size": file.size,
})
})).json();
let chunkSize = 1024 * 1024 * 10;
let chunks = Math.ceil(file.size / chunkSize);
for (let i = 0; i < chunks; i++) {
// TODO: retries
let didUpload = await this._uploadFile(file, i, chunkSize, fileMetadata);
}
// Finalise the upload
await this._finalizeUpload(file, fileMetadata);
Doing a quick test to see how this performs, although I’m not expecting any improvements. And indeed, there weren’t. In fact, it’s slower, at 270.301 seconds (4.51 minutes).
But this is where the performance boost comes in, in theory: replacing the loop with a call to Promise.all:
let chunkSize = 1024 * 1024 * 10;
let chunks = Math.ceil(file.size / chunkSize);
// Prepare promises for each upload chunk, then dispatch them all at once
let uploadPromises = new Array(chunks);
for (let i = 0; i < chunks; i++) {
uploadPromises[i] = this._uploadFile(file, i, chunkSize, fileMetadata);
}
await Promise.all(uploadPromises);
// Finalise the upload
await this._finalizeUpload(file, fileMetadata);
Testing this approach. As expected, the progress bar is jumping around as each upload fights to update it. But we can deal with that later. We’re only interested in the upload time.
And the results are in. And sadly, they’re not much better: 269.597 seconds, about 4.49 minutes.
I am curious if this has something to do with the browser limiting the number of parallel uploads per domain. I hacked a version of the JavaScript which would send each chunk to a separate subdomain, so chunk 0 will go to u0.localhost:3000, chunk 1 will go to u1.localhost:3000 and so on. This does mean adding CORS middleware to Blogging Tools, as this would be considered cross-origin requests. But alas, this didn’t help matter: the upload time was still around 269 seconds.
So the bottleneck must be the connection itself. I was afraid of that, especially when you consider that the browser and OS would be using all the available network bandwidth to do the upload. I can’t think of any reason why it would be throttled. Another issue might simply be distance. Blogging Tools is currently hosed in Germany, since Hetzner doesn’t offer any hosting locations in Australia. But there’s also Singapore, which is quite close. Maybe it’s worth moving Blogging Tools there.
Anyway, it may still be useful to keep the chunking uploader around, if for no other reason just to avoid timeouts due to long running connections. So I undid all that multi-domain work and finishing the feature off with a stalled upload reaper which will close file handles that haven’t been written to in the last 5 minutes. I also found that I wasn’t doing anything to throttle the fetch requests: a hundred or so were dispatched at once, and I think the stalled upload reaper was having an effect on all the requests fighting for bandwidth: I found uploads failing because no single request managed to finish after 5 minutes, and the reaper was killing the upload.
So made a few small changes to only dispatch batches of 5 at a time, each with a 2 MB chunk, and bumped the reaper to wait for 30 minutes. Gave it a test to see how long it took to upload a 469.4 MB file. It was not quick: 7,112 seconds, or around 118.5 minutes, or 1.98 hours. It did work, and I managed to get the file to Blogging Tools. Forwarding it on to Micro.blog failed though: turns out the file was too big anyway. So yeah, may need to do something about that.
Now, this may have been wasted effort, but I think it’s worth keeping. There are some benefits from uploading files this way, such as not having connections die due to timeouts. You also have the opportunity to retry particular chunks that individually fail, without causing the entire upload to be wasted. And I do think they are being uploaded in parallel: seeing the 5 requests in the console end around the same time suggests that they’re not entirely sequential. But I think the limiting factor here is that my upload speeds are terrible. It’d be probably easier upgrading the connection first before embarking on any more fancy upload techniques.