Skip to content

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776

Open
c-herz wants to merge 21 commits into
borgbackup:masterfrom
c-herz:datefilter
Open

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776
c-herz wants to merge 21 commits into
borgbackup:masterfrom
c-herz:datefilter

Conversation

@c-herz

@c-herz c-herz commented Apr 19, 2025

Copy link
Copy Markdown

This PR adds preliminary support for matching ISO 8601 timestamps with the date: archive filter, and intends to begin addressing the requirements of #8715.

Timestamps (except for Unix epoch forms) are currently assumed to be in the user's local timezone and converted to UTC internally. The following formats are currently supported:

  1. YYYY
  2. YYYY-MM
  3. YYYY-MM-DD
  4. YYYY-MM-DDTHH -> matches a 1-hour interval
  5. YYYY-MM-DDTHH:MM -> matches a 1-minute interval
  6. YYYY-MM-DDTHH:MM:SS -> matches a 1-second interval
  7. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including fractional seconds
  8. @123456789 -> Unix epoch (interpreted as UTC)

(Is the fractional-second exact match useful in practice? Feedback welcome on this.)

Still in progress:

  • Support for wildcard-style matching (date:2025-*-01)
  • User-specified timezones and offsets
  • Keywords like now, oldest, newest, etc
  • Explicit duration intervals
  • RFC 3339 / RFC 9557 format support
  • Test/documentation coverage

@c-herz c-herz changed the title Add preliminary support for ISO-8601 timestamps via date: archive match pattern Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) Apr 19, 2025
@PhrozenByte

PhrozenByte commented Apr 20, 2025

Copy link
Copy Markdown
Contributor

Thanks for picking this up! ❤️

  1. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including fractional seconds
  2. @123456789 -> Unix epoch (interpreted as UTC)

(Is the fractional-second exact match useful in practice? Feedback welcome on this.)

What's the precision of an archive's creation time? From the code I assume it's with fractional seconds, right? I absolutely agree with you then: There should be a variant that is guaranteed to match a single archive. I feel like that Unix timestamps should also optionally support fractional seconds then.

  1. YYYY-MM-DDTHH -> matches a 1-hour interval

From reading the code I derive that the 1-hour interval is matched exclusively, i.e. it's actually matching any archive within 00:59:59.9999… hours, correct? Perfect 👍

# Year/Year-month/Year-month-day
parts = expr.split("-")
try:
    if len(parts) == 1:                    # YYYY
        year = int(parts[0])

Even though I like the simplicity, I feel like that Borg should be pretty strict about the format, because being less strict easily leads to ambiguity. For example, is date:0 supposed to match any archive created in year 0? Probably, but it gets way less clear with (now deprecated) truncated ISO8601 dates: What does the pattern date:25-1 describe? January of the year 25, January 1925, or January 2025?

If there's no library that can be used, I always imagined that the code would basically revolve around a single, rather strict regex with bottom-up optional groups for year, month, day, hour, minutes, seconds, and fractal seconds, or * as wildcard, supplemented by another regex to match periods, and simple matchers for Unix timestamps and keywords. I'm not saying that this is the best approach, that's just what I imagined while writing #8715.

In general I like to encourage creating extensive unit tests as early as possible. It's elegant and simple code now (🚀 👍), but complexity will increase greatly when adding more and more features.

Note: I can read the code, but can't do an actual code review - for that I just don't known enough of Borg's code.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Some minor stuff I found...

Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py Outdated
@ThomasWaldmann

ThomasWaldmann commented Apr 21, 2025

Copy link
Copy Markdown
Member

BTW, if you install the pre-commit hook, you can have your commits automatically formatted.

https://borgbackup.readthedocs.io/en/stable/development.html#building-a-development-environment

Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py
(?:
:(?P<minute>\d{2}|\*) # minute (MM or *)
(?:
:(?P<second>\d{2}(?:\.\d+)?|\*) # second (SS or SS.fff or *)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, that is a nice regex now. :-)

You could also deal with fractional seconds as with all other components: it is a optional component, so you can also match it with a named group and later check the groupdict.

Comment thread src/borg/helpers/time.py
Comment on lines +306 to +312
r"(?:(?P<years>\d+)Y)?"
r"(?:(?P<months>\d+)M)?"
r"(?:(?P<weeks>\d+)W)?"
r"(?:(?P<days>\d+)D)?"
r"(?:(?P<hours>\d+)h)?"
r"(?:(?P<minutes>\d+)m)?"
r"(?:(?P<seconds>\d+)s)?"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we use to have lowercase ymwd and uppercase HMS?

@PhrozenByte PhrozenByte Apr 25, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-herz, is that a custom format, or did I miss an ISO 8601 or RFC update? I know ISO-8601's P3Y6M4DT12H30M5S (P designator, 3 years, 6 months, 4 days, T time separator, 12 hours, 30 minutes, 5 seconds, all being optional) format. I honestly don't like that "official" format very much (it's so cumbersome…), especially in regards to using M for both months and minutes (it's still unambiguous due to the T separator), but I feel like we should support it, because it's an official part of ISO 8601.

However, I kinda like the idea of additionally supporting our own format. Like, why not support 12:34:56 (or similar, just a quick idea) to specify a 12 hours, 34 minutes, 56 seconds duration? Why not also support (space) as alternative to T to separate times? The designators could ignore case (i.e. also supporting 7d for 7 days), and we could allow a space after each term. This might go too far though. In general, I'm not sure whether we should put things into the "official" P designator, or rather additionally support our own (like the suggested D, e.g. D 3y 6m 4d 12:30:05?). Is there maybe another common or even formalized/standardized (like another RFC; I did some research, unfortunately I didn't find anything official or quasi-official) format? WDYT?

Comment thread src/borg/helpers/time.py
# ISO week date: YYYY-Www or YYYY-Www-D
(?P<isoweek_year>\d{4})-W(?P<isoweek_week>\d{2})(?:-(?P<isoweek_day>\d))?
| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit silly, but it just got me thinking: Could we support ordinal days with wildcard years? 🤔 Same for ISO weeks, possibly even *-W*-5 (wow, that looks crazy 😆) to match all archives created on a Friday? Not sure whether users would use it, but if it's possible? WDYT?

Comment thread src/borg/helpers/time.py
| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})
| # Unix epoch
@(?P<epoch>\d+)

@PhrozenByte PhrozenByte Apr 25, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also supporting fractal seconds here would be amazing! ❤️

Side note: I might read the regex wrong, but this also means @1745577106[Europe/Berlin] (or any other TZ format) is supported? AFAIK Unix timestamps are UTC per definition, right? Or is the TZ info used later for something else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fractional :-)

Comment thread src/borg/helpers/time.py
Also supports:
TIMESTAMP/TIMESTAMP
TIMESTAMP/DURATION
DURATION/TIMESTAMP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ❤️

Just DURATION (i.e. without a TIMESTAMP) isn't supported yet, right? Just specifying a duration is helpful to match the latest archives relative to "now" (which could be another useful keyword).

@codecov

codecov Bot commented Apr 25, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.71038% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.62%. Comparing base (d1899c1) to head (9cb5e5f).
⚠️ Report is 584 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/helpers/time.py 88.66% 11 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8776      +/-   ##
==========================================
- Coverage   81.89%   81.62%   -0.28%     
==========================================
  Files          74       74              
  Lines       13324    13517     +193     
  Branches     1968     2008      +40     
==========================================
+ Hits        10912    11033     +121     
- Misses       1750     1802      +52     
- Partials      662      682      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@c-herz c-herz marked this pull request as ready for review April 26, 2025 00:45
@ThomasWaldmann

Copy link
Copy Markdown
Member

before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail.

@ThomasWaldmann

Copy link
Copy Markdown
Member

Guess I'ld like to merge this for next beta, can we finish this until then?

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping?

@c-herz

c-herz commented Jun 5, 2025

Copy link
Copy Markdown
Author

My apologies, I have been caught up with finals and starting a new internship. I should be able to look into @PhrozenByte's suggestions by this weekend. Apologies for the delays and the rather poor communication!

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping? (no hurry, but if you find some time, it would be nice to finish this)

@ThomasWaldmann

Copy link
Copy Markdown
Member

Note: this PR can't even run the tests anymore, but it's not the fault of the code here, but current Cython requires some fixes not yet present here, but fixed in current master branch.

Please rebase on current master.

@ThomasWaldmann

Copy link
Copy Markdown
Member

Also, I would appreciate if we could get this into a mergable state ASAP.

We will soon have bigger changes and movements in the code, so there could be lots of merge conflicts coming.

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping

@PhrozenByte

Copy link
Copy Markdown
Contributor

What do you think about reducing the scope of this PR? It might help move things forward and get the core functionality merged sooner.

A recent example is #8775, which was opened just one day before this PR and was successfully merged yesterday after its scope was reduced.

In particular, support for durations, from/to matching, and special keywords (e.g. "oldest") seems to be more involved than originally anticipated. Would it make sense to focus on date support first, perhaps with only very basic or no wildcard support, and leave the more complex parts for follow-up PRs?

@c-herz

c-herz commented Jun 22, 2026

Copy link
Copy Markdown
Author

Re: @PhrozenByte's suggestion--that definitely would make it a bit easier for me to get things up to speed after a year of neglect.

My sincere apologies for the radio silence on my end. Over the past year life really got in the way for me and I just didn't have time to work on hobby projects. However, I certainly should've communicated that better.

In any case, I should have some time this weekend and possibly here and there throughout the week to clean things up and at least get the basic feature I worked on last year rebased onto master and integrated with the major changes since then (which to be honest, I need to catch up on). I can shelve any nice-to-haves for now and focus on core functionality, and certainly keep everything as a reference if these are desired in the future and/or I have time to extend. Please let me know what should be the highest priority moving forward!

@PhrozenByte

Copy link
Copy Markdown
Contributor

There's really no need to apologize. We're all contributing in our spare time, and there are certainly more important things in life than getting a PR merged quickly 🙂

In my opinion, it should ultimately be up to you to decide which features you want to work on and where you want to draw the line for the initial implementation.

That said, since I originally suggested this in #8715, my personal motivation was simply being able to match archives by static dates. I create daily backups, and while the date used to be part of the archive name in Borg 1.x, archive series in Borg 2.0 work differently. Because of that, something as simple as a static match pattern like date:2026-06-22 would already cover my personal use case. Everything else suggested in #8715 mostly came from considering how the feature could be extended in useful ways. For example, the idea of even more coarse-grained dates such as date:2026-06 or date:2026-W26 came from considering users that create monthly or weekly backups.

But again, you're the one doing the actual work here, so feel free to draw the line wherever it makes sense to you, both in terms of implementation effort and what you actually feel like working on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants