Directory Watcher
The Directory Watcher is a service that monitors a directory on disk for files and automatically ingresses them into DeltaFi. Files placed in subdirectories of the watched directory are uploaded to DeltaFi Core via the ingress API, with each subdirectory corresponding to a Data Source.
Compose Only
The Directory Watcher is only available when running DeltaFi in Compose orchestration mode.
Enabling the Directory Watcher
The Directory Watcher is enabled by default. To disable it, set deltafi.dirwatcher.enabled to false in site/values.yaml:
deltafi:
dirwatcher:
enabled: falseDirectory Structure
The watched directory is located at ${DATA_DIR}/dirwatcher/ on the host. Create subdirectories within it to define Data Sources:
${DATA_DIR}/dirwatcher/
├── source-a/ ← Files here ingress to Data Source "source-a"
│ ├── file1.txt
│ └── reports/
│ └── file2.txt ← Also ingresses to "source-a"
└── source-b/
└── file3.csv ← Ingresses to "source-b"The Data Source name is always the first directory under the watched root, regardless of nesting depth. A file at dirwatcher/source-a/reports/deep/file.txt ingresses to Data Source source-a.
How Files Are Processed
- Detection — Files are detected via filesystem notifications or periodic directory scans (every 10 minutes).
- Settling — The service monitors the file's size at 100ms intervals. Once the size remains unchanged for the configured settling time (default 1 second), the file is considered fully written.
- Upload — A worker POSTs the file to the DeltaFi ingress API. Files are streamed without loading the full contents into memory.
- Cleanup — On successful upload (HTTP 200), the file is deleted from disk.
- Retry — On failure, the file is retried at the configured retry interval, indefinitely, until the upload succeeds or the file is removed from disk.
File Settling
Files are often written incrementally — copied over a network, generated by another process, etc. The settling mechanism prevents uploading a partially-written file by waiting until its size stabilizes.
The service polls the file size every 100ms. After enough consecutive polls show the same size, the file is considered settled. With the default 1-second settling time, this means 10 consecutive stable checks.
Configuration
Configure the Directory Watcher through site/values.yaml under deltafi.dirwatcher:
deltafi:
dirwatcher:
enabled: true
workers: 20
maxFileSize: 2147483648
retryPeriod: 300
settlingTime: 1000| Setting | Default | Description |
|---|---|---|
enabled | true | Enable or disable the Directory Watcher service |
workers | 20 | Number of concurrent upload workers |
maxFileSize | 2147483648 (2 GB) | Maximum allowed file size in bytes |
retryPeriod | 300 | Seconds between retry attempts for failed uploads |
settlingTime | 1000 | Milliseconds a file must remain unchanged before upload |
After changing site/values.yaml, restart DeltaFi for the changes to take effect.
WARNING
Files exceeding maxFileSize will fail on every attempt and be retried indefinitely. Ensure this limit matches your system's ingress capacity.
Default Metadata
Each directory in the watched tree can include a metadata file that attaches key-value pairs to every file uploaded from that directory.
Supported Formats
Place one of the following files in any directory under the watched root:
.default_metadata.yaml(checked first).default_metadata.json
The file should contain a flat map of string key-value pairs:
# .default_metadata.yaml
environment: production
priority: high
classification: unclassified{
"environment": "production",
"priority": "high",
"classification": "unclassified"
}Metadata Inheritance
Metadata is inherited from parent directories. If a file's directory has no metadata file, the service walks up the directory tree until it finds one (stopping at the watched root). A metadata file in a subdirectory replaces the parent's metadata entirely — it does not merge with it.
${DATA_DIR}/dirwatcher/
├── source1/
│ ├── .default_metadata.yaml ← Applied to file1.txt and file3.txt
│ ├── file1.txt
│ ├── subdir1/
│ │ ├── .default_metadata.yaml ← Overrides source1's metadata for file2.txt
│ │ └── file2.txt
│ └── subdir2/
│ └── file3.txt ← Inherits source1's metadata
└── source2/
├── .default_metadata.json
└── file4.txtMetadata files are monitored for changes and reloaded automatically — no restart required.
Hidden Files
Files and directories with names beginning with . are ignored, with two exceptions:
- Metadata files (
.default_metadata.yamland.default_metadata.json) are always processed. - Immediate children of the watched directory starting with
.are watched normally. This allows hidden directories at the top level to function as Data Sources.
Tuning
- Workers — Increase
workersif many files arrive simultaneously and the Core can handle the concurrency. Decrease if the Core is being overwhelmed. - Settling time — Increase
settlingTimefor environments where files are written slowly (e.g., network mounts). Decrease for local writes where files appear atomically. - Retry period — Decrease
retryPeriodfor faster recovery from transient Core outages. Increase to reduce load on a struggling Core. - Max file size — Set
maxFileSizeto match your system's capacity for individual file ingress.

