One-sided convergence in the Boltzmann-Grad limit

We review various contributions on the fundamental work of Lanford deriving the Boltzmann equation from hard-sphere dynamics in the low density limit. We focus especially on the assumptions made on the initial data and on how they encode irreversibility. The impossibility to reverse time in the Boltzmann equation (expressed for instance by Boltzmann's H-theorem) is related to the lack of convergence of higher order marginals on some singular sets. Explicit counterexamples single out the microscopic sets where the initial data should converge in order to produce the Boltzmann dynamics.


Goals
The Boltzmann equation was introduced at the end of the nineteenth century to predict the almost sure behavior of a perfect gas out of thermodynamic equilibrium. This equation expresses the ballistic transport and collisions of microscopic particles (atoms) which are supposed to interact in essence as elastic hard spheres.
However the resulting dynamics exhibits very different features compared to the reversible deterministic system of hard spheres, which is a Hamiltonian system. The Boltzmann equation generates indeed a semi-group with a Lyapunov functional (the entropy increases along the evolution), and an attractor as time goes to infinity (the density converges to thermodynamic equilibrium). These discrepancies between the microscopic and the macroscopic descriptions were the starting point of some violent controversy opposing for instance Boltzmann to Loschmidt [9,10,12,24]. There is still an important challenge in understanding the origin of the non-reversible Boltzmann equation and the conditions under which it can provide a good approximation of the microscopic dynamics. We refer to [16,23] for a review on the irreversibility and on the key role played by entropy and to [33] for a modern perspective on Loschmidt's argument. In this paper, we will focus on a more quantitative analysis of the mathematical aspects leading to the emergence of irreversibility.
The convergence result describing at best, up to now, the transition from the reversible microscopic dynamics to irreversible kinetic equations is due to Lanford [21]. It states that the Boltzmann equation can be obtained as the limit of the deterministic dynamics in a box of size 1 • in the low density regime, i.e. as the number of particles N → ∞, their size ε → 0, with the additional condition that the inverse mean free path N ε d−1 remains of order 1 (where d is the space dimension); • up to excluding some pathological situations which occur with vanishing probability in this limit; • provided that initially the particles are distributed independently.
One important restriction is that this convergence result holds only for short times, which is not enough to observe any relaxation towards equilibrium. Despite many efforts, this restriction has not been removed to this day. There is no attempt in the present paper to improve the convergence time. Our goal here is to study the appearance of irreversibility which already occurs for short times.
-986 -More precisely, we intend to discuss in detail the assumptions on the initial data in Lanford's theorem, as they encode all the information on the future evolution. The statement is the following. Theorem 1.1 ( [21]). -Consider a system of N hard spheres of diameter ε on the d-dimensional periodic box T d = [0, 1] d (with d 2), initially "independent" and identically distributed with continuous density f 0 such that for some β > 0, µ ∈ R. More precisely, we choose the initial distribution of N particles with minimal correlations, due only to the non overlapping conditions :

2)
denoting by Z N the partition function, that is the normalizing constant for f N,0 to be a probability.
In the Boltzmann-Grad limit N → ∞ with N ε d−1 = 1, the one particle distribution f (1) 3) with initial data f 0 , on a time interval [0, t * ] where t * depends only on the parameters β, µ of (1.1).
Extensions of this result to other potentials than hard-sphere interactions have been recently achieved in [3,15,27].
As asserted by Boltzmann himself, the absence of contradiction between reversible microscopic (Newton) equations and the non-reversible Boltzmann equation is due to the fact that only "typical" solutions to the former equations are well approximated by f . The way to give a precise meaning to this typicality is to introduce a statistical description of the initial state, which is in fact the point of view of Theorem 1.1 [22,32].
The goal of the present paper is to analyze in detail the proof of Lanford's theorem in order to point out where irreversibility shows up. We shall see that part of the information is lost in the convergence process as some pathological -987 -sets of configurations with vanishing measure are neglected. These sets turn out to be not time-reversal invariant and the possibility to retrace one's steps fades in the limit.
Furthermore note that, in Theorem 1.1, the weak notion of convergence at time t prevents us from iterating the result as written. Describing more precisely the geometry of the microscopic sets, we shall introduce a notion of one-sided convergence holding at positive times as well as at time zero. Thus we will obtain a refined statement of the theorem (Theorem 2.9) compatible both with the irreversibility and the time-concatenation (semigroup) properties of the limiting equation (Section 3). A similar notion of one-sided convergence has been introduced by Denlinger in [14], see also [19] for a first, non quantitative version.
In order to characterize precisely the (small) sets where the convergence of the initial data is essential, we shall finally construct explicit examples of measures which are badly behaved exclusively in those regions, leading to a violation of Theorem 1.1 (Section 4).

Microscopic dynamics
In the following we denote, for 1 i N , z i := (x i , v i ) and Z N := (z 1 , . . . , z N ). With a slight abuse we say that Z N belongs to T dN × R dN if X N := (x 1 , . . . , x N ) belongs to T dN and V N := (v 1 , . . . , v N ) to R dN . The phase space is denoted by where | · | stands for the distance on the torus. We now distinguish precollisional configurations from post-collisional ones by defining for indexes Given a post-collisional configuration as the (pre-collisional) configuration having the same positions (x k ) 1 k N , the same velocities (v k ) k =i,j for non interacting particles, and the following pre-collisional velocities for particles i and j

Defining the Hamiltonian
with specular reflection on the boundary, meaning that if Z N belongs to We have denoted { · , · } the Poisson bracket defined by The Liouville equation (1.5) writes therefore with initial data given by (1.2) and the condition (1.1). Note that f N is symmetric with respect to permutations (which corresponds to the relabeling of particles).
Remark 1.2. -Note that although the boundary condition (1.6) seems to introduce a symmetry between pre-collisional and post-collisional configurations, what has to be prescribed for the system to be well-posed is the density on post-collisional configurations for positive times, and for pre-collisional configurations for negative times, which are the incoming configurations for the transport equation (1.5).
We recall, as shown in [1] for instance, that the set of initial configurations leading to ill-defined characteristics (due to grazing collisions, clustering of collision times, or collisions involving more than two particles) is of measure zero in D N ε .

Propagation of chaos
We define the marginals on D n ε (extending by zero outside) by -989 -Then one can show formally as in [17,21] and [13,15] that the first marginal, which describes the typical evolution of the gas, evolves according to with v , v 1 as in (1.3). This equation can be interpreted by saying that a particle at z = (x, v) moves in a straight line until it collides with one of the remaining N − 1 particles. The velocities are then updated and the source term is determined by the joint distribution f (2) N . The notion of propagation of chaos (Stoßzahlansatz) lies at the heart of the derivation of Boltzmann's equation (1.3). Heuristically, one would like to write that when two particles at configurations z = (x, v) and z 1 = (x + εν, v 1 ) collide then the marginal distribution factorizes This statement of the Stoßzahlansatz is far from a mathematical assertion as f (2) N is only defined almost surely in T 2d × R 2d and not on sets of codimension 1. A more standard notion of propagation of chaos is given by the following definition.
(1. 10) In (1.10) the coordinates z, z 1 are fixed independently of N and ε (contrary to (1.9)). As a consequence, this notion turns out to be too weak to derive Boltzmann equation from the microscopic evolution.
We shall see in Section 2 that the proof of Theorem 1.1 is not based on proving directly the propagation of chaos but on a more global convergence of all the marginals. One of the goals of this paper is to quantify the refined notion of convergence (see Theorem 2.9) which is strictly needed in Lanford's argument. The propagation of chaos (1.10) can be derived as a byproduct.

Lanford's proof
In order to understand how the assumptions on the initial data come into play, we have to look more precisely at the proof of Theorem 1.1, which is actually the corollary of a more precise result. Lanford's result indeed provides the convergence of all marginals f (n) N defined in (1.7) to the solutions f (n) of an infinite system of coupled equations, which is the so-called Boltzmann hierarchy with v i , v n+1 as in (1.3). Note that f (n) is symmetric with respect to permutations and that Moreover the cross-section (v n+1 − v i ) · ν + is invariant under the exchange of the velocities v n+1 , v i (exchangeability property) and under the collision The connection between the Boltzmann hierarchy and the Boltzmann equation is discussed in [31].
The starting point of the proof is to write an explicit representation of the n particle distribution f (n) N as a superposition of pseudo-dynamics, with weights depending on the initial data. More precisely, by averaging and iterating Duhamel's formula for the N -particle distribution f N , we end up with a series expansion for f (n) N in which the term of order s corresponds to pseudo-dynamics involving s collisions and is therefore expressed as an operator acting on the initial (n + s)-particle distribution f -991 - The strategy of proof then relies on two main steps.
• First we obtain a uniform bound on the series expansion, which is responsible for the short time restriction in Theorem 1.1. In the following, we restrict our attention to times smaller than the radius of analyticity of the series. • The convergence to the solution of the Boltzmann hierarchy then follows from the convergence of the trajectories representing the different pseudo-dynamics (note that these trajectories are related to the representation formula and that they do not coincide in general with the physical trajectories of the particles, e.g. [28] for further discussions). The convergence of pseudo-trajectories fails to hold when there are recollisions (see page 997 for a precise definition of recollisions). A geometric argument shows however that, for any fixed n, the set of initial configurations with n particles leading to such recollisions is of vanishing measure in the N → ∞ limit.
Note that all the information on these bad sets is forgotten in the limit: this is related to irreversibility, that is to the impossibility of going back to the initial state. Furthermore the convergence of the first marginal to the solution of the Boltzmann equation in the case of factorized initial data such as (1.2) is due to a uniqueness property for the Boltzmann hierarchy; this follows from the uniform bound on the hierarchy obtained in the first step of the above strategy.

The series expansions
A formal computation based on Green's formula (see [11,15,21] for instance) leads to the following BBGKY hierarchy for n < N on D n ε with the boundary condition as in (1.6) Recall that collisions between more than two particles at the same time may be disregarded as they correspond to a measure zero set of initial values in -992 -the phase space. The collision term is defined by The closure for n = N is given by the Liouville equation (1.5). Note that the collision integral is split into two terms according to the sign of (v i − v n+1 ) · ν and we used the trace condition on ∂D N + ε (i, n + 1) to express all quantities in terms of pre-collisional configurations.
To obtain the Boltzmann hierarchy, we compute the formal limit of the transport and collision operators when ε goes to 0. Recall that for fixed n, then (N −n)ε d−1 → 1 in the Boltzmann-Grad limit. Thus the limit hierarchy is given by where C 0 n,n+1 are the limit collision operators defined by (2.1). We denote by (f (n) 0 ) n∈N a family of initial data for this hierarchy (which will be specified later as a tensor product, see the statement of Theorem 2.9).
Iterating Duhamel's formula for the BBGKY hierarchy (2.2), we get f (n) where we have defined denoting by S n the group associated with free transport in D n ε with specular reflection on the boundary.
-993 -Remark 2.1. -Note that, for fixed N , the operator C n,n+1 is a trace on a manifold of codimension 1 and thus it is a priori not defined on L ∞ functions. What makes sense is the combination dt n+1 C n,n+1 S n+1 (t n+1 − t n+2 ) (see [4,30,15] and Figure 2.1).
For t 0, one has t n+1 − t n+2 0, it is therefore necessary to express the collision operator in terms of pre-collisional configurations. In a symmetric way, for t 0, one has t n+1 − t n+2 0, and we have to express the collision operator in terms of post-collisional configurations (see Remark 1.2).
where we have defined denoting by S 0 n the group associated with free transport in (T d × R d ) n . Let us denote |C s,s+1 |, |Q n,n+s | the operators obtained by summing the absolute values of all the elementary terms. The energy H s = 1 2 s i=1 |v i | 2 is conserved by the transport so -994 -and from the loss estimates on the collision operators (see [15] for instance) we get the Cauchy-Kowalewski type iterated estimate forβ < β Using the initial data (1.2) and the condition (1.1), we deduce following [35] an upper bound on the marginals from (2.5) where λ, chosen large enough, depends on β, µ, and t * is such that λt * = β/2 for instance. The convergence time in Lanford's Theorem 1.1 is given by t * . Similar estimates hold for the limit operators Q 0 n,n+s and S 0 s , as well as for the solution of the Boltzmann hierarchy. These estimates provide the wellposedness of the BBGKY and Boltzmann hierarchies.

Geometrical representation as a superposition of pseudodynamics
The usual way to study the s-th term of the representation formula is to introduce some pseudo-dynamics describing the action of the operator Q n,n+s . We first extract combinatorial information on the collision process: we describe the adjunction of new particles (in the backward dynamics) by ordered trees.
Once we have fixed a collision tree a ∈ A n,n+s , we can reconstruct pseudo-dynamics starting from any point in the n-particle phase space -995 -

consider a collection of times, angles and velocities
We then define recursively the pseudotrajectories in terms of the backward BBGKY dynamics as follows • in between the collision times t i and t i+1 the particles follow the iparticle backward flow with specular reflection; • at time t + i , particle i is adjoined to particle a(i) at position x a(i) +εν i provided it remains at a distance ε from all the others, and with We denote by z i (a, Z n , T n+1,n+s , Ω n+1,n+s , V n+1,n+s , τ ) the position and velocity of the particle labeled i, at time τ (provided τ < t i ). We also define G n+1,n+s (a) as the set of parameters (T n+1,n+s , Ω n+1,n+s , V n+1,n+s ) such that the pseudo-trajectory Z n+s (a, Z n , T n+1,n+s , Ω n+1,n+s , V n+1,n+s , τ ) exists up to time 0, meaning that by adjunction of a new particle, there is no overlap. The configuration obtained at the end of the tree, i.e. at time 0, is Z n+s (a, Z n , T n+1,n+s , Ω n+1,n+s , V n+1,n+s , 0). For s = 0, there are no adjoined particles and the pseudo-trajectory Z n (∅, Z n , τ ), τ ∈ (0, t) is just the n−particle backward flow with specular reflection.
Similarly, we define the pseudo-trajectories associated with the Boltzmann hierarchy. These pseudo-trajectories evolve according to the backward Boltzmann dynamics as follows • in between the collision times t i and t i+1 the particles follow the iparticle backward free flow; • at time t + i , particle i is adjoined to particle a(i) at exactly the same position x a(i) . Velocities are given by the laws (2.9).
With these notations, the representation formulas (2.5) and (2.6) for the marginals of order n can be rewritten respectively Note that the variables ν i are integrated over spheres and the scalar products take positive and negative values (corresponding to the positive and negative parts of the collision operators).
The question is then to describe the asymptotic behavior of the BBGKY pseudo-trajectories. We actually split them into two classes : • pseudo-trajectories having no recollision, i.e. such that particles interact only at the times of adjunction of new particles, and are transported freely between two such times; • pseudo-trajectories involving recollisions.
Note that no recollision occurs in the Boltzmann hierarchy as the particles have zero diameter.

Bad configurations
The transport semigroups S n (with recollisions) and S 0 n (free transport) play a key role in the discrepancies between the BBGKY series (2.5) and the -997 -Boltzmann series (2.6). In a given time interval, both transports coincide if no recollision occurs which will be the typical case for fixed n and ε small. However, specific configurations lead to recollisions and we define below the corresponding geometric sets.
Denote by B R the ball of R d centered at zero and of radius R, and fix a time T much bigger than the radius of analyticity t * given in (2.8) as well as a parameter ε 0 ε. The sets of bad configurations of n particles are defined as 10) where | · | stands for the distance on the torus. This means that, starting from B n− ε at time t, the backward free flow on D n ε will involve at least one recollision between t and t − T , and starting from B n+ ε , the forward free flow on D n ε will involve at least one recollision between t and t + T . In particular outside these sets, we have The first term in each series (2.5) and (2.6) involves the transport, both first terms coincide when ±t > 0 for configurations which are outside the bad set B n± ε . We stress the fact that similar sets have been already introduced by Denlinger in [14] and previously identified in [5] as key sets (see [5, Appendix A] for a discussion on irreversibility).
The following result is an easy calculation.
Their measure is controlled by We now suppose that t 0 since the situation when t 0 can be deduced by a simple symmetry in t and v. The next terms in the series expansion (2.5), (2.6) involve some averaging with respect to the parameters (t i , v i , ν i ) n+1 i n+s describing the adjunction of new particles. What can be proved is that, provided that the n-particle backward flow Ψ n on D n ε -998 -does not lead to a recollision, then the probability of having a recollision (involving at least one of the added particles) is very small.

Convergence outside bad configurations
Let us first prove that the solutions are close by eliminating bad trajectories. By definition, the set of good configurations with k particles will be such that the particles remain, by backward free flow, at a distance ε 0 ε|log ε| for a time T t * , i.e. that they belong to the set Recall that the distance | · | is on the torus. The logarithmic factor in (2.11) is due to the fact that at each adjunction of a new particle, there is a shift in positions of the order of ε; the number of adjoined particles will be chosen much (logarithmically) smaller than |log ε| One can show that good configurations are stable by adjunction of a (k + 1) th -particle next to a particle labelled by m k k, provided some bad sets are removed. More precisely, let Z 0 k = (X 0 k , V k ) be in G k (ε 0 ) and Z k = (X k , V k ) with positions close to X 0 k and same velocities (cf. (2.11)). Then, by choosing the velocity v k+1 and the deflection angle ν k+1 of the new particle k + 1 outside a bad set B m k (Z 0 k ), both configurations Z k and Z 0 k will remain close to each other. Of course, immediately after the adjunction, the particles m k and k+1 will not be at distance ε 0 , but v k+1 , ν k+1 can be chosen such that the particles drift rapidly far apart and after a short time δ > 0 the configurations Z k+1 and Z 0 k+1 are again in the good sets G k+1 (ε 0 /2) and G k+1 (ε 0 ). Note that the shift in positions at each adjunction of a new particle in the BBGKY pseudo-trajectories implies that, with respect to the Boltzmann pseudo-trajectories, there is an extra error ε|log ε| ε 0 /2 so that Z k+1 ∈ G k+1 (ε 0 /2). In the next statement the time of adjunction of a new particle is chosen to be at time u = 0 because the argument is translation invariant in time.
such that good configurations close to Z 0 k are stable by the adjunction of a collisional particle close to the particle x 0 m k in the following sense.
, a new particle with velocity v k+1 is added at x m k + εν k+1 to Z k and at x 0 m k to Z 0 k . Two possibilities may arise: (2.14) Moreover after the time δ, the k + 1 particles are in a good configuration

(2.15)
For a post-collisional configuration ν k+1 · (v k+1 − v m k ) > 0, the velocities are updated then (2.16) Moreover after the time δ, the k + 1 particles are in a good configuration We refer to [15] for a complete proof of Proposition 2.5 and simply recall that it can be obtained from the following control on free trajectories (note that compared to [15] there is an additional loss of a |log ε| which is due to the action of the scattering operator and is actually missing in [15]).
Lemma 2.6. -Given T > 0, ε δ 1 and ε|log ε| ε 0 min(δR, 1), such that for any v 2 ∈ B R and x 1 , x 2 such that |x 1 −x 0 1 | |log ε|ε, |x 2 −x 0 2 | |log ε|ε, the following results hold: Proposition 2.5 is the elementary step for adding a new particle. This step can be iterated in order to build inductively good pseudo-trajectories Z and Z 0 . Note that after adding a new particle, velocities remain identical at each time in both configurations, but their positions differ due the exclusion condition in the BBGKY hierarchy which induces a shift of ε at each creation of a new particle.
To estimate Q n,n+s (t)f , we then split the integration domain in several pieces: • pseudo-trajectories with large energy H n+s (Z n+s ) R 2 1; • pseudo-trajectories with collisions separated by a time less than δ 1; • pseudo-trajectories (with moderate energy and collisions well separated in time) having recollisions; • good pseudo-trajectories in the sense of Proposition 2.5.
Bad pseudo-trajectories have a small contribution to the integrals thanks to (2.13) while good pseudo-trajectories of the BBGKY and Boltzmann hierarchies can be coupled.

Convergence of initial data
To estimate the contribution of good pseudo-trajectories, we have then to combine the continuity of f between initial data on the set of initial configurations which may be reached by such pseudo-dynamics: since we only consider pseudo-trajectories leading to good configurations, what we need to compute is (f (s) . With the specific choice of initial data (1.2) in Theorem 1.1, one can prove (see [15] for instance) that the initial data of both hierarchies are close, in the sense that for s 2 ( 2.18) This condition implies that f It remains to gather all error estimates and to use the continuity property (2.7) for the operators Q n,s+n . We define the weighted norm f n L ∞ β,n := f n exp(βH n ) L ∞ , with H n the Hamiltonian (1.4). Fixing the parameters ε 0 , δ, s, n such that |log ε|ε ε 0 min(δR, 1) , n + s |log ε| , and choosing R C|log ε|, the error term from Proposition 2.5 converges to 0. The term by term convergence is then obtained from the following estimate, thanks to the previous analysis and (2.18).
Proposition 2.7. -Under the assumptions of Theorem 1.1 and assuming that f 0 is Hölder continuous in space, then for all β < β there is a constant C > 0 and γ(ε, ε 0 ) going to zero with ε such that ) C n+s t s e −β Hn γ(ε, ε 0 ) .

A refined convergence statement
The previous argument shows that once recollisions have been discarded, pseudo-trajectories are stable as ε → 0, in the sense that their distance to the corresponding Boltzmann pseudo-trajectory converges to 0. The only assumptions used to obtain the convergence of the marginals for times t ∈ [0, t * ] are that the initial data f 0 has some regularity in space and the initial marginals satisfy the uniform growth condition together with the convergence Actually  We thus can state the following refined version of Lanford's theorem which provides quantitative convergence estimates outside the bad sets.
for some a > 0 and for |log ε|ε ε 0  Compared to [5], this theorem provides a description of the geometry of the bad sets along the evolution, and quantitative estimates of their measures. Note that a similar notion of one-sided convergence has been introduced by Denlinger in [14].

Irreversibility and time concatenation
Note that the very same proof shows that, in the Boltzmann-Grad limit, the marginal f (n) N converges tof ⊗n wheref is the solution of the reverse Boltzmann equation ) and times from 0 to −t * . This convergence requires only the growth condition (2.19) and the initial convergence for some a > 0.
We thus have a symmetric situation for negative and positive times, which indicates once more that the initial data play a very special role distinguishing between the direct and reverse Boltzmann dynamics.

At the macroscopic level
Recall that the Boltzmann dynamics admits a Lyapunov functional. Indeed, using the well-known facts (see [13]) that the mappings (v, v 1 ) → (v 1 , v) (microscopic exchangeability) and (v, v 1 , ν) → (v , v 1 , ν) (microscopic reversibility) have unit Jacobian determinants and preserve the cross-section, one can show that formally for any test function ϕ , and similarly for ϕ.
Disregarding integrability issues, we choose ϕ = log f in (3.3), and use the properties of the logarithm, to find The so-defined entropy production is therefore a nonnegative functional in agreement with the second principle of thermodynamics.
This leads to Boltzmann's H-theorem, stating that the entropy is (at least formally) a Lyapunov functional for the Boltzmann equation.
Then, for all t 0 formally The classical interpretation of the H-theorem is that entropy measures the quantity of microscopic information that is known on the system. The microscopic dynamics itself is reversible, but the entropy is not a deterministic quantity associated to one realization of the dynamics. Entropy is a statistical quantity which measures the volume of the set of all possible microscopic configurations corresponding to the macroscopic information retained in the kinetic description. Irreversibility is therefore related to a loss of information in our description of the dynamics, not to the dynamics itself.
Note that, for negative times, the distribution is evolved according to the reverse Boltzmann dynamics, and we have  It is important to realize that the loss of reversibility is already present at the level of the Boltzmann hierarchy and does not come from some averaging or projection in phase space. In particular, it is not directly related to the chaos assumption. Indeed, it can be shown that the Boltzmann hierarchy is irreversible: from the Hewitt-Savage theorem (see [18]) and the symmetry assumption on the labels, we indeed know that the initial data can be decomposed as a superposition of chaotic initial data, i.e. that there exists a measure π on the space of probability densities such that f (n) 0 = g ⊗n 0 dπ(g 0 ) for any n ∈ N * .

One-sided convergence and irreversibility
Then, by linearity of the Boltzmann hierarchy (2.1), we deduce that the family (f (n) (t)) n∈N * defined by where g(t) is the solution to the Boltzmann equation with initial data g 0 , is a solution to the Boltzmann hierarchy. Since the entropy is nondecreasing for each solution of the Boltzmann equation, we deduce that is nondecreasing, which encodes irreversibility.
This result means that microscopic information has been lost in the limiting process. The proof of the proposition follows by noticing that the entropy, along the flow of the Boltzmann equation, is increased by the reverse dynamics and decreased by the backward evolution. Now we turn to the appearance of irreversibility in the limiting process. We fix two times 0 < τ < τ < t * . Consider the representation formula (2.5) for the marginals f

At the microscopic level
It can be written starting from time τ instead of 0, meaning f (n) since the Liouville equation (1.5) satisfied by f N is reversible and autonomous with respect to time (it generates a group of evolution). As usual for analytic functions, the radius of convergence of the series at τ is at least t * − τ . Note that the limitation on the convergence time t * in Lanford's theorem comes from the fact that we use the Cauchy-Kowalewski theorem to get a uniform estimate of the radius of convergence of the previous series expansion. In particular, the same argument shows that this radius of convergence at time τ is at least t * − τ .
What we would need to apply the refined version of Lanford's theorem (Theorem 2.9) starting from time τ and moving back to τ is the convergence of f (n+s) N (τ ) on the sets V − n+s,n which consist of the configurations of n + s particles at time τ reached by good pseudo-dynamics having s collisions on [τ , τ ]. Note that these pseudo-trajectories are built forward as they go from time τ to τ and that we have which is the symmetric counterpart of Proposition 2.8.

Recollisions of the backward dynamics are indeed exactly collisions of the forward pseudo-dynamics. This implies that we have no information about
the convergence of f (n) N (τ ) on the sets V − n+s,n , and that we cannot prove the convergence to the reverse Boltzmann dynamics on [τ , τ ] starting from τ (which is consistent with the fact that the reverse Boltzmann dynamics is not the backward Boltzmann dynamics!). For the same reasons the argument behind the so-called Loschmidt's paradox fails. Indeed if at time τ we invert all the velocities and consider f (n) N (τ, X n , −V n ) as initial data, we cannot apply Theorem 2.9 so that there is no contradiction with the backtracking of marginals. The same argument was already put forward in [5]. This means therefore that the structure of the family f (n) N (τ ) n N is very different from the chaotic structure of the initial data. Remark 3.3. -Evolving a chaotic data by the reverse Boltzmann dynamics gives a systematic method to construct data for which the Boltzmann-Grad limit fails to hold, even though we do have a weak chaos property in the sense of Definition 1.3. In Section 4, we show a more explicit construction leading to an almost chaotic initial data, with modifications of the second order correlations on a small set, such that the limiting dynamics is free transport (far from the Boltzmann dynamics).

Time-concatenation
Another important feature of the limiting equation is that one can iterate in the sense of the following proposition. This property is a simple consequence of the fact that the Boltzmann equation is a local in time partial differential equation, with no memory effect. It is a kind of Markov property of the underlying process.
Let τ < τ denote intermediate times, positive but strictly smaller than Lanford's time t * . As previously, we denote by f N the solution to the Liouville equation with chaotic initial data in the sense of Theorem 1.1. If we want to iterate Lanford's convergence proof on [τ, τ ], what we need (in addition to the uniform L ∞ a priori estimate) is the convergence of f And from the refined version of Lanford's theorem (Theorem 2.9), we have that Combining both properties, we deduce that we can iterate the convergence as long as the growth condition (2.19) is satisfied.
Remark 3.5. -Note that the main limiting factor to extend the convergence time is the loss with respect to β in the estimate (2.7). The previous iteration argument fails therefore to improve the time of convergence in Lanford's theorem for initial data of the form (1.2). For initial data close to equilibrium, it is proved in [7,8] that one can actually reach times of the order O(log log log N ). The proof relies on global a priori bounds, it consists in designing a subtle pruning procedure to get rid of the contribution of super-exponential collision trees and then to express the contribution of all other dynamics in terms of the initial data.

Chaotic initial data leading to different dynamics
At large scales, the propagation of chaos (1.10) holds and the measure factorizes, but the memory of the Hamiltonian dynamics remains encoded in f N (t) on very specific configuration sets of size vanishing with ε. We are not yet able to describe the refined structure of the correlations in the density f N (t), but we are going to introduce an example which illustrates how constraints on very small sets may change the nature of the dissipative dynamics. Unlike the one obtained by reversing velocities (see Remark 3.3), this example will be totally explicit.
Using the notation (2.10) of the bad sets, we consider the initial datâ is the set such that some collision occurs between the N particles within a time T . Contrary to the definition (2.10), we choose T as a short time and set T = δ > 0. By construction the measuref N,0 will evolve according to free transport on the time interval [0, δ] as there are no interactions between the particles. In particular, the evolution of the first marginalf In the following, we are even going to argue that, at a macroscopic scale, the structure of the measure (4.1) behaves essentially as the one of the initial data f N,0 given in (1.2) for which Lanford's Theorem holds. In particular, we deduce that a chaos property (1.10) holds for the measuref N,0 . The key point is that the two measures differ on very singular sets which are exactly the relevant sets for the microscopic evolution.
To prove this, it is convenient to rephrase the measure (4.1), which has a fixed number of particles, in a slightly different setting where N is varying. The terminology "canonical" and "grand canonical" ensemble (inherited from statistical physics) is used, respectively, for the two pictures. The canonical ensemble is the setting introduced in Section 1 of this paper, where N is fixed, while the grand canonical ensemble is defined in Section 4.1 below. In this new setting, one introduces "rescaled correlation functions" f (j) ε,0 describing the same macroscopic behaviour as the marginalsf ε,0 have some remarkable advantage, as they can be dealt with by using standard methods of expansion developed in different contexts [20,25] (for the problem of adapting cluster expansion techniques to a canonical setting, we refer to [26]).

The grand canonical formalism
The grand canonical phase space is D ε = ∪ n 0 D n ε (actually D n ε = ∅ for n large, due to the exclusion). Given (f n,0 ) n 0 we assign the collection of probability densities for the configuration Z n ∈ D n ε , -1011 - where µ ε = ε −d+1 and f n,0 is yet to be specified. The normalization constant Z ε is given by {W n ε,0 } n 0 defines the grand canonical state on D ε , normalized as The total number of particles N is random and distributed according to a Poisson law The choice µ ε = ε −d+1 ensures that the average number of particles grows as ε −d+1 , hence the inverse mean free path remains of order 1 (Boltzmann-Grad scaling) We postpone this check to the end of the section.
Let us define the j-particle correlation function, j = 1, 2, . . . . The idea is to count how many groups of j particles fall, in average, in a given configuration Z j = (z 1 , . . . , z j ): where we are labelling the particles and indicating their (random) configuration by ζ 1 , . . . , ζ n , and the brackets denote average with respect to the grand canonical state. Expressing the correlation function in terms of the densities and using the symmetry in the particle labels we get In the case with minimal correlations, i.e. when where the last inequality follows by removing the constraint between the j particles and the rest of the system. Note that the rescaled correlation functions f (j) ε,0 are quantities of order 1 in ε. The Boltzmann equation can be derived for both ensembles [5,32,29].

A counterexample
A natural reformulation of (4.1) with varying number of particles is obtained as follows. Define 1 n! W n ε,0 (Z n ) := where µ ε = ε −d+1 and ζ ij = ζ(z i , z j ) = −1 C (z i , z j ) with C the set leading to a collision The normalization constant Z ε is given as above by By construction, the grand canonical density (4.5) evolves according to the free transport dynamics in the time interval [0, δ], The rescaled correlation functions f (j) ε,0 obey some of the assumptions required to apply Lanford's theorem, in particular the key L ∞ bound holds thanks to (4.4). Moreover, we will see in Proposition 4.2 below that a chaos property holds in a sense stronger than (1.10). Nevertheless the correlation functions are irregular at the microscopic scale on the sets B j+ ε so that Lanford's proof cannot apply and there is no contradiction with (4.6). Note that the constraints are imposed only in the forward direction, thus we get the reverse Boltzmann equation for negative times.
To conclude this example, we will show that the state is chaotic.
for all j 2.
The result for j = 2 will follow by applying Theorem 2.3 of [25] (recalled below) where the decay of correlations has been estimated by means of cluster expansion.   We first treat the term m = 1 and show that for some constant C > 0 Given z 1 , z 2 , we distinguish two cases to evaluate the measure of the overlap R(z 1 , v 1 ) ∩ R(z 2 , v 1 ). Let α be the angle between the axis of both cylinders, i.e. the angle between v 1 − v 1 and v 1 − v 2 .
Suppose first that the lines {x 1 +λ(v 1 −v 1 ) , λ ∈ R} and {x 2 + µ(v 1 − v 2 ) , µ ∈ R} intersect at some point u (see Figure 4.1). Then the length = min{|u − x 1 |, |u − x 2 |} satisfies For the intersection to occur one needs that If the two lines in the picture do not intersect (as will happen in general for d > 2), the above inequality can be proved by a similar argument. Define u, v as the points in the first and second lines where the distance 2ε between both lines is reached. Then we can project all vectors orthogonally to u − v, and we get exactly the same picture.
As a conclusion, we get that θ should belong to a solid angle of order Integrating over x 1 and v 1 − v 1 , we deduce that Combined with (4.11), this completes (4.10).
x 2 We now show that the contribution of the term m is bounded by C m δ m−1 |log ε| 1/2 (4.12) for some constant C. Summing over m this will complete the derivation of (4.9) for δ small enough.
To estimate the case m = 1, we simply used the fact that |x 1 − x 2 | ε|log ε|. Suppose that x 2 is such that |x 1 − x 2 | ε|log ε|. Then integrating with respect to z 1 leads to where we applied an estimate as (4.10) using part of the exponential factor, and removed the constraint 1 in the upper bound. Finally, we can integrate term by term as the constraint on z i depends only on z i+1 . This leads to a contribution of the form Cε d−1 δ(|v i | + |v i+1 |) for each constraint. After integrating the velocities, we obtain an upper bound which implies an estimate as in (4.12).
It remains to consider the set {|x 1 − x 2 | ε|log ε|}. We first integrate over z 2 This breaks the cluster into two independent parts which can be estimated separately by the product of the volume of the cylinders, leading to a higher order contribution ε|log ε| d C m δ m−1 . This completes the derivation of (4.12) and the proof of (4.7) for j = 2.
The statement for j = 1 is also similar and follows from the cluster expansion of [25]. In fact Theorem 2.2 and Proposition 6.1 in [25] imply that f (1) ε,0 is uniformly bounded by a geometric series for δ small.
The case j > 2 can be treated similarly, however the expressions are more lengthy and we refer to [34] for details.

Some wrong ideas about irreversibility
The previous analysis brings a more precise understanding of Loschmidt's paradox : it indicates where the irreversibility of the Boltzmann description appears in the limiting process.
We would like first to comment upon some of the possible explanations which can be found in the literature.
• The direction of time in the Boltzmann dynamics is not related to an arbitrary choice in writing the collision operator. Once the initial data is prescribed, one has no choice in expressing the collision operator in terms of pre-collisional configurations for positive times, and in terms of post-collisional configurations for negative times. As explained in Remark 2.1, this is the only way to define properly the traces by using the transport operator. This is also related to the fact that only the distribution of ingoing configurations has to be prescribed for the transport equation (see Remark 1.2). • Irreversibility is not a direct consequence of chaos. One can indeed start from a non chaotic initial data, in which case the Boltzmann hierarchy does not decouple. However, even in this case, we have seen in Section 3.1 that the limiting evolution is irreversible. We indeed have a Lyapunov functional, obtained by linear superposition of the entropy functionals with the Hewitt-Savage measure, which is non increasing. • Irreversibility is not due to neglecting the interaction length in the collision process. In the limit, we forget indeed about the relative (microscopic) positions of the particles at the time of collisions, but this information could be kept by introducing an intermediate description, i.e. a simple modification of the Boltzmann equation referred to as the Enskog hierarchy [29]. In this equation the collision operator is still of type (2.3). However, Arkeryd and Cercignani [2] (see also [6]) prove that the Enskog equation (and thus the Enskog hierarchy using the previous superposition principle) is irreversible.

A very singular averaging process
Neglecting spatial micro-translations in the limit induces a first loss of information. The second loss of information, which is actually responsible for the loss of reversibility, consists in neglecting pathological configurations, i.e. configurations leading to pseudo-trajectories involving recollisions. These sets B ± ε0 defined in (2.10) have a simple geometric definition, and their measure converges to 0 in the limit. So apparently it seems rather natural not to care about them.
The point is that the marginals at time t can be computed as weighted averages of the initial marginals on very singular sets, which have exactly the same structure and the same measure. Recollisions of the backward dynamics are indeed exactly collisions of the forward pseudo-dynamics. We have therefore identified very precisely why time-concatenation is possible while reversing the arrow of time is not. This can be summarized as in Figure 5 Note that, for a better understanding of the Boltzmann dynamics, it is not enough to look at the specific initial data (1.2), as its particular form is not stable under the dynamics. We would need a more systematic classification of the limiting dynamics depending on the microscopic structure of the nparticle distribution.