Repository: github.com/LuizFernando991/go-sync-folder
For a long time I wanted a simple way to share the same folders across my devices: my Obsidian vault, a few directories of everyday stuff, that kind of thing. I could have used Dropbox, Drive, Syncthing… but two things came together: (1) Oracle Cloud offers a surprisingly generous free VM, and (2) what I really wanted was to learn Go more deeply. So instead of installing something ready-made, I decided to build it.
The starting point was this video: How To Build A Complete Distributed File Storage In Golang — a genuinely long tutorial (we're talking hours) that builds a distributed file storage system in Go from scratch. It gave me the mental model; from there the project took its own direction, focused on my use case: mirroring folders between machines through a self-hosted central server.
An honest disclaimer: this started as simple experiments and I kept refining it by trial and error until it became code I consider good. You can still find rough corners, leftovers from the experiments. And I didn't treat this as a product — so feel free to adapt it. The most glaring example: the "authentication" is just a token compared in constant time. I've built robust, secure auth many times before; to learn, I'd rather spend energy on the core of the problem (synchronization) than on reinventing login.
What it does
The idea is a "minimalist, self-hosted Dropbox for Linux":
- A central server (
syncdrive-server) holds the authoritative copy of the files and a manifest (the list of files and folders, with hash, size, and timestamp). - A daemon (
syncdrive-daemon) runs on each machine, watches local folders, and keeps them mirrored with the server — files and directories, including empty ones.
┌───────────────────────────────────────────────┐
│ syncdrive-server │
│ files/ + manifest.json (files & dirs) │
└───────────────▲─────────────────▲─────────────┘
│ HTTP + Bearer │
┌───────────┘ └────────────┐
┌────────┴──────────┐ ┌─────────┴────────┐
│ syncdrive-daemon │ │ syncdrive-daemon │
│ watcher + workers│ │ watcher + workers│
└───────────────────┘ └──────────────────┘
machine A machine B
Three modes per folder: two-way (upload and download), push (upload only), and pull (download only).
The data model: the manifest
Everything revolves around lightweight metadata. No shipping a file just to detect a change — we compare metadata and only transfer what actually changed.
type FileMeta struct {
Path string `json:"path"`
Size int64 `json:"size"`
SHA256 string `json:"sha256"`
ModTime time.Time `json:"mod_time"`
}
type DirMeta struct {
Path string `json:"path"`
ModTime time.Time `json:"mod_time,omitempty"`
}
type Manifest struct {
Files map[string]FileMeta `json:"files"`
Dirs map[string]DirMeta `json:"dirs,omitempty"`
}
The server keeps this manifest. Each client additionally keeps local state in
.syncdrive/state.json: a snapshot of "how things looked at the last successful sync."
That state is what lets us tell a real deletion apart from a file that never
arrived. Hold on to that detail — it's the heart of everything.
The core: the three-way merge
The central question of any sync tool is: given a path, what do I do? Upload? Download? Delete? Ignore?
The answer comes from comparing three views of the same path:
- Local — what's on disk right now.
- Remote — what the server manifest says.
- State — what was true at the last sync.
With those three, you can infer intent. Example: the file exists on the server, doesn't exist locally, and was in the state → that means I deleted it since the last sync, so I should delete it on the server. But if it exists on the server, doesn't exist locally, and was not in the state → it's a new file from another machine, so I should download it.
All of that decision lives in a single pure function, decideFile. Concentrating the
logic in one place was one of the project's best decisions (it used to be duplicated and
drifting between two code paths):
func decideFile(mode FolderMode, l FileMeta, hasLocal bool, r FileMeta, hasRemote bool, old FileMeta, hadOld bool) (fileAction, FileMeta) {
switch {
case hasLocal && !hasRemote:
if mode == ModePull {
return actKeep, l
}
if hadOld && old.SHA256 == l.SHA256 {
return actDeleteLocal, l // was in state and vanished from remote → deletion
}
return actUpload, l // new local file → upload
case !hasLocal && hasRemote:
if mode == ModePush {
return actKeep, r
}
if hadOld && old.SHA256 == r.SHA256 {
return actDeleteRemote, r
}
return actDownload, r
case hasLocal && hasRemote:
// ... two-way: if the hashes match, nothing to do.
if l.SHA256 != "" && l.SHA256 == r.SHA256 {
return actKeep, l
}
localChanged := !hadOld || old.SHA256 != l.SHA256
remoteChanged := !hadOld || old.SHA256 != r.SHA256
switch {
case localChanged && remoteChanged: // real conflict
if l.ModTime.After(r.ModTime) {
return actUpload, l // the newer one wins
}
return actDownload, r
case localChanged:
return actUpload, l
case remoteChanged:
return actDownload, r
}
}
return actNone, FileMeta{}
}
The conflict policy is "the most recent wins" by ModTime, with the server as the
tiebreaker when timestamps are equal (so all machines converge). It's simple,
predictable, and good enough for my use.
A detail I liked: the disk scan doesn't recompute the SHA-256 of everything on every
pass. If the size and modification time match the state, it reuses the already-known
hash. Hashing a large file for nothing is expensive; this avoids it.
Concurrency and parallelism: why this is great for learning Go
This is where the project got genuinely fun, and where Go shines. It's got
goroutines, channels, worker pools, select, sync.Mutex, sync.WaitGroup — the
whole package.
A worker pool for transfers
Transfers (upload/download) run in parallel in a pool. Downloads go before uploads, and within each kind the smaller files go first — so a huge upload doesn't starve the small ones:
ch := make(chan syncJob)
var wg sync.WaitGroup
for range workers {
wg.Add(1)
go func() {
defer wg.Done()
for job := range ch { // receive work over the channel
switch job.kind {
case jobUpload:
result, err = s.upload(root, job.path, job.meta)
case jobDownload:
err = s.download(root, job.path, job.meta)
}
// ...write the result into the next state (guarded by a mutex)
}
}()
}
for _, job := range jobs {
ch <- job // dispatch
}
close(ch) // close → workers finish the range and exit
wg.Wait() // wait for all of them
This is the classic fan-out pattern with a channel: the producer sends jobs, N
workers consume. close(ch) + range is the idiomatic way to signal "we're done."
The continuous daemon: a pool that never sleeps
In daemon mode I wanted something more reactive: an edit to a .txt should upload
immediately, even if a large upload is occupying another worker. So the
FolderSyncer keeps a permanent pool of workers, and a scan just enqueues jobs
and moves on — it doesn't wait for the transfers.
That brought a lovely concurrency problem: how do you coalesce scans? If 10 filesystem events arrive in a row, I don't want 10 stacked scans — I want one, after the last change. The solution is a "pending" flag:
func (fs *FolderSyncer) Trigger() {
fs.scanMu.Lock()
if fs.scanRunning {
fs.scanPending = true // a scan is already running? mark it to re-run at the end
fs.scanMu.Unlock()
return
}
fs.scanRunning = true
fs.scanMu.Unlock()
go func() {
for {
fs.scan()
fs.scanMu.Lock()
if !fs.scanPending { // nobody asked again → stop
fs.scanRunning = false
fs.scanMu.Unlock()
return
}
fs.scanPending = false // a request arrived during the scan → run once more
fs.scanMu.Unlock()
}
}()
}
It guarantees one scan at a time (no race) and at least one after the last change (without losing the "edge"). Honest coalescing in ~15 lines.
Watching the filesystem
The trigger comes from a watcher built on fsnotify, which accumulates "dirty" paths
and signals without blocking:
func (w *Watcher) signal(path string) {
w.dirtyMu.Lock()
w.dirty[path] = struct{}{}
w.dirtyMu.Unlock()
select {
case w.events <- struct{}{}: // buffer-1 channel: "there's news"
default: // a signal is already pending → don't block
}
}
That select with default is a trick I've grown to love in Go: send on a channel if
possible, without ever blocking.
The race that nearly got me: stopping the pool
There's a treacherous subtlety: if you close the jobs channel (close(jobsCh)) on
shutdown while a scan can still enqueue a job, you get a panic: send on closed channel.
The idiomatic fix is to not close the jobs channel; instead, close a done channel
and have everyone listen to it:
func (fs *FolderSyncer) Stop() { close(fs.done) } // that's it
func (fs *FolderSyncer) worker() {
for {
select {
case <-fs.done: // shut down
return
case job := <-fs.jobsCh: // or do work
// ...
}
}
}
// and the send observes done too:
go func() {
select {
case fs.jobsCh <- job:
case <-fs.done: // aborting: still release the WaitGroup
fs.jobsWg.Done()
}
}()
Small, but it's exactly the kind of concurrency bug that only shows up under load — and that teaches you to think in Go.
How the sync runs today (a full scan)
Let me be upfront about a current limitation: the authoritative reconciliation is a
full scan. On every sync, the daemon walks the entire local folder tree
(filepath.WalkDir) and fetches the complete manifest from the server, builds the
three views (local, remote, state), and runs BuildPlan over the union of all paths.
There's a shortcut: when the watcher reports that a specific file changed, I fast-path just that path so it uploads right away — but I still kick off the full scan afterward to reconcile directories and anything the shortcut can't see.
In practice, for normal-sized folders (Obsidian, documents) this is fast and works
really well: comparing metadata is cheap, the scan reuses hashes by size+mtime, and
polling uses an ETag (304 Not Modified) so it doesn't refetch the manifest for nothing.
But of course, walking the whole tree doesn't scale for free to millions of files.
I'm still looking for alternatives to improve this part — things like true incremental reconciliation (trusting the watcher events more), a change index/journal, or diffing only the affected subtrees. For now, the full scan is simple, predictable, and serves me well — so I left it that way on purpose, until I find a better approach worth the extra complexity.
The saga of the ghost folders 👻
This was the problem that taught me the most, and the main reason for this article.
In the first version, I synced files only. Directories were "inferred" from file paths. It sounds clever and economical — and it works, until it doesn't.
Symptom: I'd delete an entire folder on one machine, and on the other only the files disappeared. The empty folder stayed there, stranded. Worse: empty folders never synced at all. I nicknamed the bug the ghost folder.
Why did it happen? Because a directory wasn't a real entity in the system. The server
even had a Dirs field in the manifest, but it wiped it on every save
(clear(s.manifest.Dirs)) and didn't even return it from the API. The client never
recorded folders. There were half-finished functions, dead code from earlier attempts.
Folder removal relied on a fragile heuristic of "clean up directories that became empty
during this sync" — which didn't cover an intentionally empty folder, nor the case where
the folder was already empty beforehand.
The turning point was treating a directory as a first-class citizen, exactly like a file:
- The
scanstarted recording every folder (including empty ones). - The server started persisting and returning
Dirs, and recording ancestor folders when a file is uploaded. - Reconciliation became a pure planner,
BuildPlan, that produces a complete plan: uploads, downloads, deletions, and directory creation/removal.
The same three-way merge used for files now applies to folders. And order matters: create folders shallow-to-deep (parent before child), remove them deep-to-shallow (child before parent). Here's a slice of the plan handling a folder that only exists on the remote:
case !hasL && hasR:
if hadOld {
// existed at the last sync and vanished locally → mirror the removal,
// unless there's still live remote content inside it
if hasRemoteContent {
p.NextDirs[key] = rd
} else {
p.RmdirRemote = append(p.RmdirRemote, key)
}
} else {
// new remote folder: if it's already "implied" by a file, the
// download creates it; otherwise it needs an explicit mkdir (empty folder)
if hasRemoteContent {
p.NextDirs[key] = rd
} else {
p.MkdirLocal = append(p.MkdirLocal, key)
}
}
Note two details I'm proud of:
- An empty folder is the only thing that needs an explicit mkdir. A folder with a file gets created for free when the file is written — so I don't waste calls.
- The delete-vs-create conflict guard. If I delete a folder on one machine, but
another adds a new file inside it at the same time, the file wins and the folder
survives. Locally this falls out naturally because
os.Removeonly removes an empty directory: if there's new content, it refuses and I move on.
func (s *Syncer) removeLocalDir(root, dir string) {
err := os.Remove(filepath.Join(root, filepath.FromSlash(dir)))
if err == nil || errors.Is(err, os.ErrNotExist) || isNotEmpty(err) {
return // success, already gone, or had new content → all good
}
s.log("rmdir %s: %v", dir, err)
}
Result: creating, emptying, or deleting folders — including empty ones — mirrors symmetrically across devices. Ghost exorcised. And the pure planner became trivial to test (feed it three maps, check the plan), which gave me a huge safety net.
Other details worth the click
A few little things I learned to appreciate:
- Hashing while streaming. On upload, the body flows through an
io.TeeReaderthat feeds the SHA-256 at the same time it sends. I don't read the file twice, and I verify at the end that the local hash matches what the server computed. - Atomic writes. A downloaded file goes to a temporary path, and only then an atomic
os.Renamemoves it into place. You never end up with a half-written file at the destination. - Detecting a file changing mid-upload. A
stableFileReaderchecks, during the send, whether size/mtime changed — if they did, it cancels the upload instead of shipping something inconsistent. - The
.downloadingplaceholder. While downloading, afile.downloadingappears in the folder, cleaned up automatically when the download finishes, fails, or the daemon restarts. Cheap visual feedback. - Conditional manifest with ETag. Polling sends
If-None-Match; if nothing changed on the server, it answers304 Not Modified. Saves bandwidth for free. - Path traversal protection. Paths with
.., absolute paths, or\are rejected before touching the disk.
What I deliberately left out
So I don't oversell it:
- Auth is just a token compared with
subtle.ConstantTimeCompare. Fine behind HTTPS for personal use; it is not an account system. Swapping it for something robust is a separate exercise. - No resumable transfers. A large file that's interrupted restarts from scratch. It could be done in chunks, but that was off-focus.
- It's not a product. It's a learning project that I actually use. Adapt it however you like.
Why it's a great project for learning Go
If you want to level up in Go beyond "CRUD with net/http", I recommend building
something like this. In a single project, you brush up against:
- Real concurrency and parallelism: worker pools, channel fan-out,
sync.Mutex/RWMutex,WaitGroup, and thedone-channel pattern for clean shutdown. - Idiomatic
select: non-blocking sends, cancellation, event coalescing. net/httpon both sides: a server withServeMux, streaming large bodies, conditional headers (ETag/304), and a client reusing connections.- Serious I/O:
io.Reader/io.Writer,TeeReader,MultiWriter, atomic writes, streaming hashing. - Testable design: extracting the decision into a pure function (
BuildPlan) and covering the hard cases without spinning up any server. - Hunting concurrency bugs: running
go test -raceand discovering, under load, thatsend on closed channelthat never shows up on the happy path.
And, perhaps most important: a problem with real depth. "Syncing folders" sounds trivial until you face deletions, conflicts, empty folders, and races. That's where the learning lives.
Wrapping up
I started out just wanting to sync my Obsidian vault across machines, taking advantage of a free Oracle VM, and ended up with a sync tool I understand end to end — one that taught me more Go than any standalone tutorial. The code is open at github.com/LuizFernando991/go-sync-folder; clone it, break it, improve it, make it your own.
If you do dig in, my advice: start with decideFile and BuildPlan. That's where the
system "thinks."
