Skip to content

feat(cozystack): add DRBD-oriented sysctl and etcd backend defaults#131

Merged
Aleksei Sviridkin (lexfrei) merged 1 commit into
mainfrom
feat/add-sysctl-etcd-defaults
Jun 5, 2026
Merged

feat(cozystack): add DRBD-oriented sysctl and etcd backend defaults#131
Aleksei Sviridkin (lexfrei) merged 1 commit into
mainfrom
feat/add-sysctl-etcd-defaults

Conversation

@IvanHunters
Copy link
Copy Markdown

@IvanHunters IvanHunters commented Apr 27, 2026

Summary

Adds DRBD-oriented sysctl and etcd backend defaults to the cozystack preset. Cozystack nodes always run DRBD; reconnect storms (node reboots, resync) exhaust TCP ports under the default Talos sysctl profile, and a LINSTOR-heavy control plane can outgrow etcd's default 2 GiB backend.

Changes

Always-on machine.sysctls

  • tcp_orphan_retries=3, tcp_fin_timeout=30 — reclaim orphaned and FIN-WAIT sockets faster so a reconnect storm cannot outrun cleanup.
  • netdev_max_backlog=5000, netdev_budget=600, netdev_budget_usecs=8000 — widen the receive backlog so bursty replication traffic isn't dropped under load.

These five are the tuning that resolved the TCP-port exhaustion on production clusters (see siderolabs/talos#13074).

Opt-in TCP keepalive (tcpKeepaliveTuning, default off)

tcp_keepalive_time=600 / intvl=10 / probes=6 is gated behind a new tcpKeepaliveTuning value, off by default. These sysctls are kernel-wide — they shorten idle-socket failure detection for every long-lived TCP connection on the node (NFS mounts, DB clients, MQ consumers), not just DRBD. DRBD already detects dead peers in seconds via its own protocol-level ping (ping-int/ping-timeout), so this is a generic socket backstop rather than a DRBD requirement; hence opt-in rather than baked in.

Tunable etcd backend quota (etcd.quotaBackendBytes, default 8 GiB)

cluster.etcd.extraArgs.quota-backend-bytes is exposed via a new etcd.quotaBackendBytes value (default 8 GiB, etcd's documented upper bound), emitted on controlplane nodes only. It raises etcd's 2 GiB ceiling so a control plane holding many DRBD-resource CRDs in aggregate does not trip the NOSPACE alarm. It is a ceiling, not a reservation — a small DB stays small and costs no extra RAM/disk — and can be blanked to fall back to etcd's own default.

Dropped from the original proposal: etcd max-request-bytes

The source issue also proposed max-request-bytes=10MiB. On verification this does not achieve its stated goal: LINSTOR k8s-backend writes go through kube-apiserver, whose MaxRequestBodyBytes is hardcoded at 3 MiB with no configuration flag (kubernetes/kubernetes#88968 — "No flag was intended for this option"). The effective per-object ceiling stays at the apiserver's 3 MiB regardless of etcd's setting, and the real "large dataset" concern is aggregate DB size, which quota-backend-bytes already covers. The knob was dropped rather than shipped as a no-op.

Tests

  • New contract tests pin every branch: the always-on DRBD sysctls (present on cozystack, absent on generic), the keepalive toggle (absent by default, present when enabled, operator-settable while off, collision-protected while on), and the etcd quota (default, tunable, omitted-when-blank, controlplane-only, absent on generic).
  • docs/manual-test-plan.md gains a scenario (B9) exercising each of the above.

Docs

Operator-facing documentation updated in a companion PR: cozystack/website#567.

Related

Summary by CodeRabbit

  • New Features
    • Added an opt-in TCP keepalive tuning toggle and a configurable etcd backend quota (raised default) for improved network reliability and reduced NOSPACE risk.
  • Chores
    • Broadened preset sysctl tuning for network/DRBD performance (non-overridable by operators when preset active).
  • Documentation
    • Expanded manual test plan with validation steps for sysctl and etcd quota behavior.
  • Tests
    • Added contract and render tests covering sysctl behavior and etcd quota rendering.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f59cd06e-6038-4ecc-9c12-33b989449624

📥 Commits

Reviewing files that changed from the base of the PR and between 523f62c and ca508b6.

📒 Files selected for processing (6)
  • charts/cozystack/templates/_helpers.tpl
  • charts/cozystack/values.yaml
  • docs/manual-test-plan.md
  • pkg/engine/contract_cluster_test.go
  • pkg/engine/contract_machine_test.go
  • pkg/engine/render_test.go
💤 Files with no reviewable changes (3)
  • pkg/engine/render_test.go
  • pkg/engine/contract_cluster_test.go
  • pkg/engine/contract_machine_test.go

📝 Walkthrough

Walkthrough

Adds preset DRBD/LINSTOR and netdev/TCP kernel sysctls (with conditional tcp_keepalive gating), renders default sysctl values in machine docs, adds an etcd.extraArgs.quota-backend-bytes value driven by chart values, updates values/docs, and adds tests asserting render and collision behaviors.

Changes

Talos sysctl & etcd tuning

Layer / File(s) Summary
Helm helpers: sysctls & etcd extraArg
charts/cozystack/templates/_helpers.tpl
Marks DRBD/LINSTOR, TCP orphan/FIN-WAIT, and netdev backlog/budget keys as builtin; conditionally treats net.ipv4.tcp_keepalive_* as builtin when tcpKeepaliveTuning is enabled; emits default machine.sysctls and sets etcd.extraArgs.quota-backend-bytes from values.
Chart values: new flags and etcd block
charts/cozystack/values.yaml
Adds tcpKeepaliveTuning: false and etcd.quotaBackendBytes: "8589934592" with docs and refines extraSysctls collision commentary.
Manual test plan updates
docs/manual-test-plan.md
Adds B9 scenarios covering DRBD sysctl checks, tcpKeepalive gating, controlplane-only etcd quota, blanking behavior, and regression anchors.
Cluster contract tests for etcd quota
pkg/engine/contract_cluster_test.go
Adds tests asserting controlplane renders the quoted quota-backend-bytes, workers must not emit it, override and blanking behaviors, and generic-chart absence.
Machine contract tests for sysctls
pkg/engine/contract_machine_test.go
Extends collision case list and adds tests verifying DRBD tuning presence, tcp_keepalive gating/override/collision behaviors, and absence on generic presets.
Render tests: controlplane assertions
pkg/engine/render_test.go
Updates legacy and multi-doc controlplane render tests to assert new sysctls and the etcd quota extraArg value.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • #2332: Implements the same sysctl defaults and etcd.extraArgs.quota-backend-bytes described in the issue.

Possibly related PRs

  • cozystack/talm#211: Related prior change to extraSysctls collision-handling logic in the Helm helpers.

Poem

🐰 I nibble docs and tweak each tune,
DRBD hums beneath the moon,
Keepalive waits when toggles say no,
Etcd keeps eight gigs in tow,
Hooray—networks sleep well, tubby and swoon.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding DRBD-oriented sysctl defaults and etcd backend configuration to the cozystack Helm chart.
Linked Issues check ✅ Passed All proposed defaults from issue #2332 are implemented: TCP orphan/FIN-WAIT sysctls, netdev backlog/budget sysctls, TCP keepalive sysctls (opt-in), and etcd quota-backend-bytes. The PR includes max-request-bytes in documentation but focuses on quota-backend-bytes as the primary etcd configuration, which is appropriate for the scope.
Out of Scope Changes check ✅ Passed All changes align with the linked issue objectives: template updates for sysctl/etcd configuration, values additions for the new options, test coverage for the new functionality, and documentation updates describing the changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-sysctl-etcd-defaults

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several kernel sysctl optimizations for TCP orphan handling, network backlog, and keepalive settings. It also increases etcd's storage quota and maximum request size to accommodate large LINSTOR CRD datasets. A review comment suggests that the Kubernetes API server should also be configured with a matching --max-resource-write-bytes limit to ensure the increased etcd request size is fully effective.

Comment thread charts/cozystack/templates/_helpers.tpl Outdated
{{- toYaml .Values.advertisedSubnets | nindent 6 }}
extraArgs:
quota-backend-bytes: "8589934592" # 8GiB - prevent etcd running out of space with large LINSTOR CRD datasets
max-request-bytes: "10485760" # 10MiB - allow larger CRD objects to be stored
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Increasing max-request-bytes in etcd is a necessary step for handling large objects, but it is often insufficient on its own. To fully support larger CRD objects (such as large LINSTOR datasets), the Kubernetes API server should also be configured with a matching --max-resource-write-bytes limit in its extraArgs. Without this corresponding change, the API server may reject large requests before they reach etcd, rendering the increased etcd limit ineffective for those operations.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
charts/cozystack/templates/_helpers.tpl (1)

32-42: Sysctl defaults look reasonable; consider making them configurable.

The added TCP/network tunings are sensible defaults for clusters running DRBD/LINSTOR replication. A few non-blocking observations:

  • These sysctls apply uniformly to controlplane and workers (intended, since DRBD runs cluster-wide), but they are also imposed on clusters that don't use DRBD. If a deployment uses cozystack without LINSTOR/DRBD, the aggressive tcp_keepalive_time=600 (vs. kernel default 7200) and shortened tcp_fin_timeout=30 will still apply.
  • All values are hardcoded — there's no way to override via values.yaml without forking the chart. This matches the style of the existing sysctls above (lines 29–31), so it is consistent, but you may eventually want to expose at least the DRBD-specific knobs (similar to how nr_hugepages is gated via $.Values.nr_hugepages).
♻️ Optional: gate DRBD-tuning sysctls behind a value (e.g. .Values.drbdTuning)
     net.ipv4.neigh.default.gc_thresh1: "4096"
     net.ipv4.neigh.default.gc_thresh2: "8192"
     net.ipv4.neigh.default.gc_thresh3: "16384"
+    {{- if (default true $.Values.drbdTuning) }}
     # TCP orphan handling
     net.ipv4.tcp_orphan_retries: "3"
     net.ipv4.tcp_fin_timeout: "30"
     # Network backlog
     net.core.netdev_max_backlog: "5000"
     net.core.netdev_budget: "600"
     net.core.netdev_budget_usecs: "8000"
     # TCP keepalive (early detection of dead connections)
     net.ipv4.tcp_keepalive_time: "600"
     net.ipv4.tcp_keepalive_intvl: "10"
     net.ipv4.tcp_keepalive_probes: "6"
+    {{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/cozystack/templates/_helpers.tpl` around lines 32 - 42, The hardcoded
sysctl entries (net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl,
net.ipv4.tcp_keepalive_probes, net.ipv4.tcp_fin_timeout, net.core.netdev_* and
tcp_orphan_retries) should be made configurable or gated so non-DRBD clusters
are not forced to use aggressive values; add a boolean flag (e.g.
.Values.drbdTuning default false) or a map (.Values.sysctls) in values.yaml and
update the template to only render these DRBD-specific tunings when
.Values.drbdTuning is true (or allow per-key overrides via the map), following
the existing pattern used for nr_hugepages to locate and change the rendering
logic in the helper template.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 134-136: The etcd extraArgs (quota-backend-bytes and
max-request-bytes) are fine but need an in-template clarification: add a comment
next to quota-backend-bytes and max-request-bytes (the keys under extraArgs)
stating these apply only to control-plane nodes and that kube-apiserver enforces
a 3MiB hard client request body limit (so the 10MiB etcd setting is for direct
etcd or special use cases like LINSTOR CRDs), and mention operational
considerations (memory sizing/defrag/alerts) so operators understand sizing
tradeoffs and why these values were chosen.

---

Nitpick comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 32-42: The hardcoded sysctl entries (net.ipv4.tcp_keepalive_time,
net.ipv4.tcp_keepalive_intvl, net.ipv4.tcp_keepalive_probes,
net.ipv4.tcp_fin_timeout, net.core.netdev_* and tcp_orphan_retries) should be
made configurable or gated so non-DRBD clusters are not forced to use aggressive
values; add a boolean flag (e.g. .Values.drbdTuning default false) or a map
(.Values.sysctls) in values.yaml and update the template to only render these
DRBD-specific tunings when .Values.drbdTuning is true (or allow per-key
overrides via the map), following the existing pattern used for nr_hugepages to
locate and change the rendering logic in the helper template.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88104462-eb73-4562-b962-3f8d22321d20

📥 Commits

Reviewing files that changed from the base of the PR and between 00358fd and 523f62c.

📒 Files selected for processing (1)
  • charts/cozystack/templates/_helpers.tpl

Comment thread charts/cozystack/templates/_helpers.tpl Outdated
Comment on lines +134 to +136
extraArgs:
quota-backend-bytes: "8589934592" # 8GiB - prevent etcd running out of space with large LINSTOR CRD datasets
max-request-bytes: "10485760" # 10MiB - allow larger CRD objects to be stored
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does kube-apiserver in Kubernetes 1.30+ require any flag (e.g., --max-request-bytes) to be coordinated when etcd's --max-request-bytes is raised to 10MiB, or does the apiserver pass requests through transparently up to etcd's limit?

💡 Result:

No, kube-apiserver in Kubernetes 1.30+ does not require any flag like --max-request-bytes to be coordinated when etcd's --max-request-bytes is raised to 10MiB. The apiserver passes etcd requests transparently up to etcd's limit, but enforces its own independent limit on incoming client request body sizes for write operations. kube-apiserver has a hardcoded default MaxRequestBodyBytes of 3MiB (3145728 bytes) for decoding write request bodies (e.g., CREATE/UPDATE/PATCH), which fails requests larger than this with "Request entity too large: limit is 3145728" before reaching etcd. This limit accounts for JSON-to-protobuf conversion expansion when storing in etcd (etcd default 1.5MiB, apiserver allows ~2x). It is not exposed as a command-line flag in standard kube-apiserver; users of the generic apiserver library can configure it programmatically. etcd's --max-request-bytes independently limits gRPC requests from kube-apiserver (e.g., Put,Txn). Raising it to 10MiB allows larger backend requests without apiserver changes, as apiserver does not impose additional byte limits on outgoing etcd calls beyond what fits its object model. apiserver transparently forwards transformed storage requests to etcd. No changes in 1.30+ alter this; official docs list no --max-request-bytes flag for kube-apiserver, and source code confirms the client-side body limit is separate from etcd's server-side limit.

Citations:


Etcd defaults align with upstream guidance — note the kube-apiserver request-size boundary.

Both values are correctly placed inside the eq .MachineType "controlplane" block, so they apply only to control-plane nodes where etcd runs. The values match etcd's documented recommendations:

  • quota-backend-bytes: 8589934592 (8 GiB) is the upstream-suggested maximum (per the etcd "How to debug large db size issue" blog post).
  • max-request-bytes: 10485760 (10 MiB) sits at etcd's documented soft cap; exactly at the 10 MB boundary and safe from WAL-decode failure risk (Restriction on MaxRequestSize and maxWALEntrySize etcd-io/etcd#14114).

Important clarification: kube-apiserver (1.30+) has a hardcoded 3MiB limit on incoming client request bodies for write operations and does not expose this as a configurable flag. This means API clients cannot send requests larger than 3MiB, even though etcd can accept 10MiB. The 10MiB etcd setting still provides value for direct etcd operations or future use cases, but no apiserver flag coordination is needed.

Operational considerations to keep in mind:

  • Memory footprint: etcd (boltDB) mmaps the backend file, so raising quota-backend-bytes from the default 2 GiB to 8 GiB allows the resident-set on each control-plane node to grow up to ~8 GiB just for etcd. Ensure control-plane VMs are sized accordingly.
  • Defragmentation: Larger backends grow fragmentation. Confirm that periodic etcdctl defrag (or equivalent) is in place; otherwise the 8 GiB quota can still be hit by fragmented free pages.
  • Alerting: Recommend adding/keeping alerts on etcd_mvcc_db_total_size_in_bytes and etcd_server_quota_backend_bytes so operators see growth trends well before hitting the new ceiling.
  • LINSTOR CRD objects: The 10MiB etcd setting is sized to handle LINSTOR CRD workloads. Document that this setting exists to support that use case and not as a general invitation to store large blobs in arbitrary CRs (which remain bound by the apiserver's 3MiB client limit).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/cozystack/templates/_helpers.tpl` around lines 134 - 136, The etcd
extraArgs (quota-backend-bytes and max-request-bytes) are fine but need an
in-template clarification: add a comment next to quota-backend-bytes and
max-request-bytes (the keys under extraArgs) stating these apply only to
control-plane nodes and that kube-apiserver enforces a 3MiB hard client request
body limit (so the 10MiB etcd setting is for direct etcd or special use cases
like LINSTOR CRDs), and mention operational considerations (memory
sizing/defrag/alerts) so operators understand sizing tradeoffs and why these
values were chosen.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title scope

Recent PRs touching charts/cozystack/templates/_helpers.tpl use a concept-style scope rather than a path: feat(cozystack): enable allocateNodeCIDRs by default (#91), fix(chart): cast metadata.id to string for regexMatch … (#110), feat(helpers): add bond interface discovery helpers (#94). feat(charts/cozystack): … is the first path-style scope I see in this repo. feat(cozystack): would mirror #91, which was a similar defaults-tweak for this same chart.

Merge ordering with #130

This PR edits cluster.etcd.advertisedSubnets immediately adjacent to where #130 wraps the same block in an if/else discovery fallback. There's no semantic conflict, but git merge-tree reports a textual conflict — whichever PR lands second will need a small rebase.

TCP keepalive blast radius

tcp_keepalive_time/intvl/probes = 600/10/6 is roughly 12× more aggressive on idle timeout than the kernel defaults (7200/75/9). The PR body frames this as DRBD-focused, but net.ipv4.tcp_keepalive_* are kernel-wide and apply to every long-lived idle TCP socket on the node — pods holding NFS mounts, idle DB clients, MQ consumers, etc. all inherit the new failure characteristics. Likely still the right default for a Cozystack cluster, but worth calling out explicitly in the PR description so operators understand the scope.

The other sysctls (tcp_orphan_retries, tcp_fin_timeout, netdev_*) are more conservative and look fine.

etcd quota and request size

quota-backend-bytes: 8 GiB is etcd's documented upper recommended ceiling, not a typical default — etcd's own default is 2 GiB. Same direction for max-request-bytes: 10 MiB (vs etcd's 1.5 MiB default). Both are correct numbers for a LINSTOR-heavy control plane, but they're strong opinions to bake in for every Cozystack cluster: a small dev cluster still pays for the larger backend in etcd RSS regardless of whether it ever uses it.

Could these be exposed in values.yaml with the proposed numbers as defaults? For example:

etcd:
  quotaBackendBytes: "8589934592"
  maxRequestBytes:   "10485760"

The recommendation still ships as the default, but operators can tune without forking the chart. #91 used this exact pattern for allocateNodeCIDRs — defaults-tweak plus values exposure in the same change.

Tests

pkg/engine/render_test.go already asserts on the rendered sysctls: block (lines 114, 211). It would be cheap to extend those assertions to the new sysctl keys, and to add a check that etcd.extraArgs renders only for MachineType: controlplane — that's already correct in the template (the block sits inside {{- if eq .MachineType "controlplane" }}), but it isn't covered by a test today.

Style nit

Inline YAML comments after string scalars (quota-backend-bytes: "8589934592" # 8GiB - …) parse fine, but the rest of this template uses Go template comments ({{- /* … */ -}}) for explanations — see #130 for examples. Aligning would help readability.

Cozystack nodes always run DRBD (the drbd module is loaded
unconditionally), and DRBD reconnect storms — node reboots, resync —
exhaust TCP ports under the default Talos sysctl profile. Ship the
network tuning that resolved this on production clusters, plus a
tunable etcd backend quota for LINSTOR-heavy control planes.

Always-on machine.sysctls (cozystack preset):

- tcp_orphan_retries=3, tcp_fin_timeout=30 reclaim orphaned and
  FIN-WAIT sockets faster so a reconnect storm cannot outrun cleanup.
- netdev_max_backlog/budget/budget_usecs widen the receive backlog so
  bursty replication traffic isn't dropped under load.

Opt-in machine.sysctls tcp_keepalive_{time,intvl,probes}, gated by
tcpKeepaliveTuning (default off): the triplet is kernel-wide and
shortens idle-socket failure detection for every long-lived TCP
connection on the node, not just DRBD. DRBD already detects dead peers
in seconds via its own protocol-level ping, so this is a generic
socket backstop rather than a DRBD requirement — hence off by default.

cluster.etcd.extraArgs.quota-backend-bytes, tunable via
etcd.quotaBackendBytes (default 8GiB, etcd's documented upper bound):
raises etcd's 2GiB backend ceiling so a control plane holding many
DRBD-resource CRDs in aggregate does not trip the NOSPACE alarm. It is
a ceiling not a reservation, emitted only on controlplane nodes; blank
it to fall back to etcd's own default. This governs total DB size, not
single-object size — per-object writes stay bounded by kube-apiserver's
fixed 3MiB request-body limit, which has no configuration knob.

The extraSysctls collision guard now covers the new preset-owned keys
(the keepalive triplet only while the toggle is on), and the manual
test plan gains a scenario exercising every branch.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) force-pushed the feat/add-sysctl-etcd-defaults branch from 523f62c to ca508b6 Compare June 5, 2026 09:36
@lexfrei Aleksei Sviridkin (lexfrei) changed the title feat(charts/cozystack): add recommended sysctl and etcd defaults feat(cozystack): add DRBD-oriented sysctl and etcd backend defaults Jun 5, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Reworked on top of current main and addressed the review points:

  • Scope renamed to feat(cozystack):.
  • etcd quota exposed as a tunable value (etcd.quotaBackendBytes, default 8 GiB) instead of a hardcoded arg; emitted on controlplane nodes only.
  • TCP keepalive moved behind an opt-in tcpKeepaliveTuning (default off): it is kernel-wide, and DRBD detects dead peers in seconds via its own protocol-level ping, so it is a generic socket backstop rather than a DRBD requirement.
  • The five DRBD/netdev sysctls stay always-on (low blast radius, production-validated).
  • etcd max-request-bytes was dropped: kube-apiserver's 3 MiB request-body limit has no configuration flag (kubernetes/kubernetes#88968), so the etcd bump could not achieve its stated goal; the aggregate-size concern is covered by the quota.
  • Contract tests cover every branch; the manual test plan gains a matching scenario; inline YAML comments replaced with Go-template comments.

Operator-facing docs: cozystack/website#567.

@lexfrei Aleksei Sviridkin (lexfrei) merged commit cb340ab into main Jun 5, 2026
8 checks passed
@lexfrei Aleksei Sviridkin (lexfrei) deleted the feat/add-sysctl-etcd-defaults branch June 5, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add recommended sysctl and etcd defaults to Talos configuration

3 participants