Skip to content

Commit 67fb89f

Browse files
committed
Deploying to gh-pages from @ c2882ab 🚀
1 parent 1475159 commit 67fb89f

607 files changed

Lines changed: 7052 additions & 6360 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

_sources/howto/free-threading-python.rst.txt

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,3 +165,132 @@ to false. If the flag is true then the :class:`warnings.catch_warnings`
165165
context manager uses a context variable for warning filters. If the flag is
166166
false then :class:`~warnings.catch_warnings` modifies the global filters list,
167167
which is not thread-safe. See the :mod:`warnings` module for more details.
168+
169+
170+
Increased memory usage
171+
----------------------
172+
173+
The free-threaded build will typically use more memory compared to the default
174+
build. There are multiple reasons for this, mostly due to design decisions.
175+
176+
177+
All interned strings are immortal
178+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
179+
180+
For modern Python versions (since version 2.3), interning a string (e.g. with
181+
:func:`sys.intern`) does not cause it to become immortal. Instead, if the last
182+
reference to that string disappears, it will be removed from the interned
183+
string table. This is not the case for the free-threaded build and any interned
184+
string will become immortal, surviving until interpreter shutdown.
185+
186+
187+
Non-GC objects have a larger object header
188+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
189+
190+
The free-threaded build uses a different :c:type:`PyObject` structure. Instead
191+
of having the GC related information allocated before the :c:type:`PyObject`
192+
structure, like in the default build, the GC related info is part of the normal
193+
object header. For example, on the AMD64 platform, ``None`` uses 32 bytes on
194+
the free-threaded build vs 16 bytes for the default build. GC objects (such as
195+
dicts and lists) are the same size for both builds since the free-threaded
196+
build does not use additional space for the GC info.
197+
198+
199+
QSBR can delay freeing of memory
200+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
201+
202+
In order to safely implement lock-free data structures, a safe memory
203+
reclamation (SMR) scheme is used, known as quiescent state-based reclamation
204+
(QSBR). This means that the memory backing data structures allowing lock-free
205+
access will use QSBR, which defers the free operation, rather than immediately
206+
freeing the memory. Two examples of these data structures are the list object
207+
and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython
208+
source tree for more details on how QSBR is implemented. Running
209+
:func:`gc.collect` should cause all memory being held by QSBR to be actually
210+
freed. Note that even when QSBR frees the memory, the underlying memory
211+
allocator may not immediately return that memory to the OS and so the resident
212+
set size (RSS) of the process might not decrease.
213+
214+
215+
mimalloc allocator vs pymalloc
216+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217+
218+
The default build will normally use the "pymalloc" memory allocator for small
219+
allocations (512 bytes or smaller). The free-threaded build does not use
220+
pymalloc and allocates all Python objects using the "mimalloc" allocator. The
221+
pymalloc allocator has the following properties that help keep memory usage
222+
low: small per-allocated-block overhead, effective memory fragmentation
223+
prevention, and quick return of free memory to the operating system. The
224+
mimalloc allocator does quite well in these respects as well but can have some
225+
more overhead.
226+
227+
In the free-threaded build, mimalloc manages memory in a number of separate
228+
heaps (currently four). For example, all GC supporting objects are allocated
229+
from their own heap. Using separate heaps means that free memory in one heap
230+
cannot be used for an allocation that uses another heap. Also, some heaps are
231+
configured to use QSBR (quiescent-state based reclamation) when freeing the
232+
memory that backs up the heap (known as "pages" in mimalloc terminology). The
233+
use of QSBR creates a delay between all memory blocks for a page being freed
234+
and the memory page being released, either for new allocations or back to the
235+
OS.
236+
237+
The mimalloc allocator also defers returning freed memory back to the OS. You
238+
can reduce that delay by setting the environment variable
239+
:envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce
240+
the performance of the allocator.
241+
242+
243+
Free-threaded reference counting can cause objects to live longer
244+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
245+
246+
In the default build, when an object's reference count reaches zero, it is
247+
normally deallocated. The free-threaded build uses "biased reference
248+
counting", with a fast-path for objects "owned" by the current thread and a
249+
slow path for other objects. See :pep:`703` for additional details. Any time
250+
an object's reference count ends up in a "queued" state, deallocation can be
251+
deferred. The queued state is cleared from the "eval breaker" section of the
252+
bytecode evaluator.
253+
254+
The free-threaded build also allows a different mode of reference counting,
255+
known as "deferred reference counting". This mode is enabled by setting a flag
256+
on a per-object basis. Deferred reference counting is enabled for the
257+
following types:
258+
259+
* module objects
260+
* module top-level functions
261+
* class methods defined in the class scope
262+
* descriptor objects
263+
* thread-local objects, created by :class:`threading.local`
264+
265+
When deferred reference counting is enabled, references from Python function
266+
stacks are not added to the reference count. This scheme reduces the overhead
267+
of reference counting, especially for objects used from multiple threads.
268+
Because the stack references are not counted, objects with deferred reference
269+
counting are not immediately freed when their internal reference count goes to
270+
zero. Instead, they are examined by the next GC run and, if no stack
271+
references to them are found, they are freed. This means these objects are
272+
freed by the GC and not when their reference count goes to zero, as is typical.
273+
274+
275+
Per-thread reference counting can delay freeing objects
276+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
277+
278+
To avoid contention on the reference count fields of frequently shared
279+
objects, the free-threaded build also uses "per-thread reference counting"
280+
for a few selected object types. Rather than updating a single shared
281+
reference count, each thread maintains its own local reference count array,
282+
indexed by a unique id assigned to the object. The true reference count is
283+
only computed by summing the per-thread counts when the object's local
284+
count drops to zero. Per-thread reference counting is currently used for:
285+
286+
* heap type objects (classes created in Python)
287+
* code objects
288+
* the ``__dict__`` of module objects
289+
290+
Because the per-thread counts must be merged back to the object before it
291+
can be deallocated, objects using per-thread reference counting are
292+
typically freed later than they would be in the default build. In
293+
particular, such an object is usually not freed until the thread that
294+
referenced it reaches a safe point (for example, in the "eval breaker"
295+
section of the bytecode evaluator) or exits. Running :func:`gc.collect`
296+
will merge the per-thread counts and allow these objects to be freed.

_sources/library/base64.rst.txt

Lines changed: 59 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,10 @@
1616
This module provides functions for encoding binary data to printable
1717
ASCII characters and decoding such encodings back to binary data.
1818
This includes the :ref:`encodings specified in <base64-rfc-4648>`
19-
:rfc:`4648` (Base64, Base32 and Base16)
20-
and the non-standard :ref:`Base85 encodings <base64-base-85>`.
19+
:rfc:`4648` (Base64, Base32 and Base16), the :ref:`Base85 encoding
20+
<base64-base-85>` specified in `PDF 2.0
21+
<https://pdfa.org/resource/iso-32000-2/>`_, and non-standard variants
22+
of Base85 used elsewhere.
2123

2224
There are two interfaces provided by this module. The modern interface
2325
supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
@@ -189,19 +191,28 @@ POST request.
189191
Base85 Encodings
190192
-----------------
191193

192-
Base85 encoding is not formally specified but rather a de facto standard,
193-
thus different systems perform the encoding differently.
194+
Base85 encoding is a family of algorithms which represent four bytes
195+
using five ASCII characters. Originally implemented in the Unix
196+
``btoa(1)`` utility, a version of it was later adopted by Adobe in the
197+
PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
198+
This version, in both its ``btoa`` and PDF variants, is implemented by
199+
:func:`a85encode`.
194200

195-
The :func:`a85encode` and :func:`b85encode` functions in this module are two implementations of
196-
the de facto standard. You should call the function with the Base85
197-
implementation used by the software you intend to work with.
201+
A separate version, using a different output character set, was
202+
defined as an April Fool's joke in :rfc:`1924` but is now used by Git
203+
and other software. This version is implemented by :func:`b85encode`.
198204

199-
The two functions present in this module differ in how they handle the following:
205+
Finally, a third version, using yet another output character set
206+
designed for safe inclusion in programming language strings, is
207+
defined by ZeroMQ and implemented here by :func:`z85encode`.
200208

201-
* Whether to include enclosing ``<~`` and ``~>`` markers
202-
* Whether to include newline characters
203-
* The set of ASCII characters used for encoding
204-
* Handling of null bytes
209+
The functions present in this module differ in how they handle the following:
210+
211+
* Whether to include and expect enclosing ``<~`` and ``~>`` markers.
212+
* Whether to fold the input into multiple lines.
213+
* The set of ASCII characters used for encoding.
214+
* Compact encodings of sequences of spaces and null bytes.
215+
* The encoding of zero-padding bytes applied to the input.
205216

206217
Refer to the documentation of the individual functions for more information.
207218

@@ -212,17 +223,22 @@ Refer to the documentation of the individual functions for more information.
212223

213224
*foldspaces* is an optional flag that uses the special short sequence 'y'
214225
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
215-
feature is not supported by the "standard" Ascii85 encoding.
226+
feature is not supported by the standard encoding used in PDF.
216227

217228
*wrapcol* controls whether the output should have newline (``b'\n'``)
218229
characters added to it. If this is non-zero, each output line will be
219230
at most this many characters long, excluding the trailing newline.
220231

221-
*pad* controls whether the input is padded to a multiple of 4
222-
before encoding. Note that the ``btoa`` implementation always pads.
232+
*pad* controls whether zero-padding applied to the end of the input
233+
is fully retained in the output encoding, as done by ``btoa``,
234+
producing an exact multiple of 5 bytes of output. This is not part
235+
of the standard encoding used in PDF, as it does not preserve the
236+
length of the data.
223237

224-
*adobe* controls whether the encoded byte sequence is framed with ``<~``
225-
and ``~>``, which is used by the Adobe implementation.
238+
*adobe* controls whether the encoded byte sequence is framed with
239+
``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
240+
that while ASCII85Decode streams in PDF documents *must* be
241+
terminated with ``~>``, they *must not* use a leading ``<~``.
226242

227243
.. versionadded:: 3.4
228244

@@ -234,10 +250,12 @@ Refer to the documentation of the individual functions for more information.
234250

235251
*foldspaces* is a flag that specifies whether the 'y' short sequence
236252
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
237-
This feature is not supported by the "standard" Ascii85 encoding.
253+
This feature is not supported by the standard Ascii85 encoding used in
254+
PDF and PostScript.
238255

239-
*adobe* controls whether the input sequence is in Adobe Ascii85 format
240-
(i.e. is framed with <~ and ~>).
256+
*adobe* controls whether the ``<~`` and ``~>`` markers are
257+
present. While the leading ``<~`` is not required, the input must
258+
end with ``~>``, or a :exc:`ValueError` is raised.
241259

242260
*ignorechars* should be a byte string containing characters to ignore
243261
from the input. This should only contain whitespace characters, and by
@@ -251,35 +269,40 @@ Refer to the documentation of the individual functions for more information.
251269
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
252270
git-style binary diffs) and return the encoded :class:`bytes`.
253271

254-
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
255-
multiple of 4 bytes before encoding.
272+
The input is padded with ``b'\0'`` so its length is a multiple of 4
273+
bytes before encoding. If *pad* is true, all the resulting
274+
characters are retained in the output, which will always be a
275+
multiple of 5 bytes, and thus the length of the data may not be
276+
preserved on decoding.
256277

257278
.. versionadded:: 3.4
258279

259280

260281
.. function:: b85decode(b)
261282

262283
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
263-
return the decoded :class:`bytes`. Padding is implicitly removed, if
264-
necessary.
284+
return the decoded :class:`bytes`.
265285

266286
.. versionadded:: 3.4
267287

268288

269289
.. function:: z85encode(s)
270290

271291
Encode the :term:`bytes-like object` *s* using Z85 (as used in ZeroMQ)
272-
and return the encoded :class:`bytes`. See `Z85 specification
273-
<https://rfc.zeromq.org/spec/32/>`_ for more information.
292+
and return the encoded :class:`bytes`.
293+
294+
The `ZeroMQ specification <https://rfc.zeromq.org/spec/32/>`_
295+
requires the length of Z85-encoded data to be a multiple of 5
296+
bytes. To produce compliant data frames, you must pad the input
297+
data to this function to a multiple of 4 bytes.
274298

275299
.. versionadded:: 3.13
276300

277301

278302
.. function:: z85decode(s)
279303

280304
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
281-
return the decoded :class:`bytes`. See `Z85 specification
282-
<https://rfc.zeromq.org/spec/32/>`_ for more information.
305+
return the decoded :class:`bytes`.
283306

284307
.. versionadded:: 3.13
285308

@@ -352,3 +375,11 @@ recommended to review the security section for any code deployed to production.
352375
Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
353376
base64 encoding.
354377

378+
`ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/>`_
379+
Section 7.4.3, "ASCII85Decode Filter," provides the definition
380+
of the Ascii85 encoding used in PDF and PostScript, including
381+
the output character set and the details of data length preservation
382+
using zero-padding and partial output groups.
383+
384+
`ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/>`_
385+
The "Formal Specification" section provides the character set used in Z85.

_sources/library/binascii.rst.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,8 @@ The :mod:`!binascii` module defines the following functions:
158158
of hexadecimal digits (which can be upper or lower case), otherwise an
159159
:exc:`Error` exception is raised.
160160

161-
Similar functionality (accepting only text string arguments, but more
162-
liberal towards whitespace) is also accessible using the
163-
:meth:`bytes.fromhex` class method.
161+
Similar functionality (but more liberal towards whitespace) is also accessible
162+
using the :meth:`bytes.fromhex` class method.
164163

165164
.. exception:: Error
166165

_sources/library/codecs.rst.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1152,7 +1152,7 @@ particular, the following variants typically exist:
11521152
+-----------------+--------------------------------+--------------------------------+
11531153
| cp857 | 857, IBM857 | Turkish |
11541154
+-----------------+--------------------------------+--------------------------------+
1155-
| cp858 | 858, IBM858 | Western Europe |
1155+
| cp858 | 858, IBM00858 | Western Europe |
11561156
+-----------------+--------------------------------+--------------------------------+
11571157
| cp860 | 860, IBM860 | Portuguese |
11581158
+-----------------+--------------------------------+--------------------------------+
@@ -1189,7 +1189,7 @@ particular, the following variants typically exist:
11891189
| | | |
11901190
| | | .. versionadded:: 3.4 |
11911191
+-----------------+--------------------------------+--------------------------------+
1192-
| cp1140 | ibm1140 | Western Europe |
1192+
| cp1140 | IBM01140 | Western Europe |
11931193
+-----------------+--------------------------------+--------------------------------+
11941194
| cp1250 | windows-1250 | Central and Eastern Europe |
11951195
+-----------------+--------------------------------+--------------------------------+

_sources/library/copy.rst.txt

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,13 @@ file, socket, window, or any similar types. It does "copy" functions and
7272
classes (shallow and deeply), by returning the original object unchanged; this
7373
is compatible with the way these are treated by the :mod:`pickle` module.
7474

75-
Shallow copies of dictionaries can be made using :meth:`dict.copy`, and
76-
of lists by assigning a slice of the entire list, for example,
77-
``copied_list = original_list[:]``.
75+
Shallow copies of many collections can be made using the corresponding
76+
:meth:`!copy` method (such as :meth:`list.copy`, :meth:`dict.copy` or
77+
:meth:`set.copy`), and of sequences (such as lists or bytearrays) by making
78+
a slice of the entire sequence (``sequence[:]``).
79+
However, these methods and slicing can create an instance of the base type
80+
when copying an instance of a subclass, whereas :func:`copy.copy` normally
81+
returns an instance of the same type.
7882

7983
.. index:: pair: module; pickle
8084

_sources/library/ctypes.rst.txt

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,14 @@ used to wrap these libraries in pure Python.
1616

1717
.. include:: ../includes/optional-module.rst
1818

19+
.. warning::
20+
21+
:mod:`!ctypes` provides low-level access to native libraries and the
22+
process's memory, bypassing Python's safety mechanisms and allowing
23+
execution of arbitrary native code.
24+
Incorrect use can corrupt data and objects, reveal sensitive information,
25+
cause crashes, or otherwise compromise the running process.
26+
1927

2028
.. _ctypes-ctypes-tutorial:
2129

@@ -200,10 +208,8 @@ argument values::
200208
OSError: exception: access violation reading 0x00000020
201209
>>>
202210

203-
There are, however, enough ways to crash Python with :mod:`!ctypes`, so you
204-
should be careful anyway. The :mod:`faulthandler` module can be helpful in
205-
debugging crashes (e.g. from segmentation faults produced by erroneous C library
206-
calls).
211+
The :mod:`faulthandler` module can help debug crashes,
212+
such as segmentation faults produced by erroneous C library calls.
207213

208214
``None``, integers, bytes objects and (unicode) strings are the only native
209215
Python objects that can directly be used as parameters in these function calls.

0 commit comments

Comments
 (0)