Skip to content

Directory Watcher

The Directory Watcher is a service that monitors a directory on disk for files and automatically ingresses them into DeltaFi. Files placed in subdirectories of the watched directory are uploaded to DeltaFi Core via the ingress API, with each subdirectory corresponding to a Data Source.

Compose Only

The Directory Watcher is only available when running DeltaFi in Compose orchestration mode.

Enabling the Directory Watcher

The Directory Watcher is enabled by default. To disable it, set deltafi.dirwatcher.enabled to false in site/values.yaml:

yaml
deltafi:
  dirwatcher:
    enabled: false

Directory Structure

The watched directory is located at ${DATA_DIR}/dirwatcher/ on the host. Create subdirectories within it to define Data Sources:

${DATA_DIR}/dirwatcher/
├── source-a/           ← Files here ingress to Data Source "source-a"
│   ├── file1.txt
│   └── reports/
│       └── file2.txt   ← Also ingresses to "source-a"
└── source-b/
    └── file3.csv       ← Ingresses to "source-b"

The Data Source name is always the first directory under the watched root, regardless of nesting depth. A file at dirwatcher/source-a/reports/deep/file.txt ingresses to Data Source source-a.

How Files Are Processed

  1. Detection — Files are detected via filesystem notifications or periodic directory scans (every 10 minutes).
  2. Settling — The service monitors the file's size at 100ms intervals. Once the size remains unchanged for the configured settling time (default 1 second), the file is considered fully written.
  3. Upload — A worker POSTs the file to the DeltaFi ingress API. Files are streamed without loading the full contents into memory.
  4. Cleanup — On successful upload (HTTP 200), the file is deleted from disk.
  5. Retry — On failure, the file is retried at the configured retry interval, indefinitely, until the upload succeeds or the file is removed from disk.

File Settling

Files are often written incrementally — copied over a network, generated by another process, etc. The settling mechanism prevents uploading a partially-written file by waiting until its size stabilizes.

The service polls the file size every 100ms. After enough consecutive polls show the same size, the file is considered settled. With the default 1-second settling time, this means 10 consecutive stable checks.

Configuration

Configure the Directory Watcher through site/values.yaml under deltafi.dirwatcher:

yaml
deltafi:
  dirwatcher:
    enabled: true
    workers: 20
    maxFileSize: 2147483648
    retryPeriod: 300
    settlingTime: 1000
SettingDefaultDescription
enabledtrueEnable or disable the Directory Watcher service
workers20Number of concurrent upload workers
maxFileSize2147483648 (2 GB)Maximum allowed file size in bytes
retryPeriod300Seconds between retry attempts for failed uploads
settlingTime1000Milliseconds a file must remain unchanged before upload

After changing site/values.yaml, restart DeltaFi for the changes to take effect.

WARNING

Files exceeding maxFileSize will fail on every attempt and be retried indefinitely. Ensure this limit matches your system's ingress capacity.

Default Metadata

Each directory in the watched tree can include a metadata file that attaches key-value pairs to every file uploaded from that directory.

Supported Formats

Place one of the following files in any directory under the watched root:

  • .default_metadata.yaml (checked first)
  • .default_metadata.json

The file should contain a flat map of string key-value pairs:

yaml
# .default_metadata.yaml
environment: production
priority: high
classification: unclassified
json
{
  "environment": "production",
  "priority": "high",
  "classification": "unclassified"
}

Metadata Inheritance

Metadata is inherited from parent directories. If a file's directory has no metadata file, the service walks up the directory tree until it finds one (stopping at the watched root). A metadata file in a subdirectory replaces the parent's metadata entirely — it does not merge with it.

${DATA_DIR}/dirwatcher/
├── source1/
│   ├── .default_metadata.yaml     ← Applied to file1.txt and file3.txt
│   ├── file1.txt
│   ├── subdir1/
│   │   ├── .default_metadata.yaml ← Overrides source1's metadata for file2.txt
│   │   └── file2.txt
│   └── subdir2/
│       └── file3.txt              ← Inherits source1's metadata
└── source2/
    ├── .default_metadata.json
    └── file4.txt

Metadata files are monitored for changes and reloaded automatically — no restart required.

Hidden Files

Files and directories with names beginning with . are ignored, with two exceptions:

  • Metadata files (.default_metadata.yaml and .default_metadata.json) are always processed.
  • Immediate children of the watched directory starting with . are watched normally. This allows hidden directories at the top level to function as Data Sources.

Tuning

  • Workers — Increase workers if many files arrive simultaneously and the Core can handle the concurrency. Decrease if the Core is being overwhelmed.
  • Settling time — Increase settlingTime for environments where files are written slowly (e.g., network mounts). Decrease for local writes where files appear atomically.
  • Retry period — Decrease retryPeriod for faster recovery from transient Core outages. Increase to reduce load on a struggling Core.
  • Max file size — Set maxFileSize to match your system's capacity for individual file ingress.

Contact US