In the paper "Matrix concentration for products" it is stated, that the following is easy to show.
Let $X_1,\dots X_n$ be independent, bounded, square matrices, which commute almost surely. Define $Y_i=I+\frac{X_i}n$. Then
$$ \log \mathbb{E}\Y_n\dots Y_1\\leq \frac 1 n \\sum_{i=1}^n\mathbb{E}X_i\ +O\left(\sqrt{\frac{\log d}n}\right) $$
$\\cdot\$ is the spectral norm and $d$ is the dimension of our matrices.
I know the weaker inequality for matrices which do not necessarily commute almost surely with $\sum_{i=1}^n\\mathbb{E}X_i\$ instead of $\\sum_{i=1}^n\mathbb{E}X_i\$.
Is this really easy to show? How do I prove it?

$\begingroup$ I am having a hard time understanding what the $\log d$ term in the error means. Is this saying that there is a universal constant, valid for any distribution of $X$ so the estimate applies? This seems spectacularly unlikely, especially if the $X$’s are of mean zero. If the constant depends on$X$, then so does $d$, so why write it? $\endgroup$– Anthony QuasJan 10 at 15:53

$\begingroup$ I can give you the exact form of the mentioned weaker inequality. We call the bound for each $\X_i\ \ L$. Then we get $$\X_i/n+\mathbb{E}X_i/n\ \leq L/n+\\mathbb{E}X_i\/n=:\sigma_i$$ With $\upsilon =\sum_{i=1}^n\sigma_i^2$ we get $$\log \mathbb{E} \Y_n\dots Y_1\\leq 1/n \sum_{i=1}^n \\mathbb{E}X_i\+\sqrt{2\upsilon(2\upsilon \lor \log d)} $$ The observation $\upsilon\leq (2L)^2/n$ leads to the weaker result. $\endgroup$– Florian EnteJan 10 at 16:37
It's true if the $X_i$'s are Hermitian. Then with probability 1 the matrices are simultaneously diagonalizable, so we may as well write everything in a basis where they are diagonal.
Then $$\mathbb E \Y_n \dots, Y_n\ = \mathbb E \max_{j \in [d]} (Y_n)_{jj} \dots (Y_1)_{jj} = \mathbb E \max_{j \in [d]} \left\prod_{i=1}^n \left(1 + \frac{(X_i)_{jj}}{n}\right) \right.$$
If $(X_i)_{jj} \leq C$ almost surely, then each term is positive so it suffices to bound $\log \mathbb E \max_{j \in [d]} e^{\sum_{i=1}^n (X_i)_{jj}/n}$. Now, this quantity is less than
$$ \log \mathbb E \max_{j \in [d]} e^{\sum_{i=1}^n ((X_i)_{jj}  \mathbb E (X_i)_{jj})/n} + \max_{j \in [d]} \frac 1n \mathbb E \sum_{i=1}^n (X_i)_{jj}.$$
The second term is bounded by $\frac 1n \left\\mathbb E \sum_{i=1}^n X_i \right\$. The first term is $O(\sqrt{\log d/n})$ by standard scalar concentration bounds for bounded random variables.

$\begingroup$ Thank you for the help! Could you very briefly explain the last sentence. Which inequality do I need to see this? $\endgroup$ Jan 11 at 9:58