This is an MCMC algorithm for uniform sampling over singular $n$ by $n$ Bernoulli matrices.

Let $H$ (for "hypercube") be the set of all 0/1 vectors of length $n$.

One step of the MCMC algorithm is as follows:

Generate an $(n-1)$ by $n$ matrix $A$, filled with 0/1 iid Bernoulli
samples. This will be the first $n-1$ rows of our proposal matrix.

Find $K$, the kernel of $A$. Let $r$ be the rank of $K$ (and note
that $r>0$).

Consider extending $A$ to an $n$ by $n$ matrix by adding a final 0/1
row. Let $L$ be the number of possible completions.

If $r>1$, then $L=2^n$. Generate a proposal matrix $M$ with the
same first $n-1$ rows as $A$, and the last row selected uniformly at
random from $H$ (transposed).

If $r=1$, let $L=|K\cap H|$. Generate a proposal matrix $M$ with
the same first $n-1$ rows as $A$, and the last row selected uniformly at random from $K\cap H$. (We will discuss how to enumerate $K\cap H$ below.)

Let $L'$ be the corresponding value for $L$ for the previous accepted
proposal. Let $p=\min(1, L/L')$. Accept the new proposal matrix with
probability $p$.

(To generate the initial proposal, we can just follow steps 1-3.)

This appears to provide a straightforward MCMC algorithm for uniform
sampling from singular Bernoulli matrices. All of the steps are
efficient and straightforward except for one: enumerating $K \cap H$.
How can we compute that in a (somewhat) practical way?

Observe that the probability that $r>1$ is less than the probability
that a uniform Bernoulli matrix is singular. My earlier post on this
question addresses suggests that this probability is on the order
of $2^{-n}$ for large $n$ (e.g. $n=100$). Therefore, we will
essentially always be in the case $r=1$.

We can find an element in $K\cap H$ by using a mixed (binary) integer linear program.
Specifically, we try to find a binary integer vector that is in the kernel $K$ (which is a linear constraint). We only need a feasible solution (i.e. "phase 1" of the MILP); the objective function doesn't matter. This produces a single element of $K\cap A$ (or shows that no such element exists).

Note that, for any $z\in H$, we can construct a linear function $f_z$ on $R^n$ that coincides with the Hamming distance from any on vector $H$ to $z$, to wit: $f_z(x)=\sum_i^{n}(z_i+(-1)^z_i x_i)$. That is, for any $x,z\in H$, $f_z(x)$ is equal to the Hamming distance between the vectors.

This provides us with a method to enumerate all elements of $K\cap A$. First, we use the MILP to enumerate one solution, $z\in H$. Then we add a linear constraint to the MILP forcing the Hamming distance from $z$ to be $\geq 1$. This removes $z$ from the feasible set but keeps all other integer solutions. We keep repeating this until the MILP finds no feasible binary solutions. This procedure will enumerate $K\cap H$.

Note that the all-zero vector provides us with an initial feasible solution (so $|K\cap H|\geq 1$). In the typical case that the first $n-1$ rows are distinct and non-zero, there are at least $n$ solutions: the all-zeroes vector, and each of the other $n-1$ rows. Therefore, we can usually skip the first $n$ steps and just add those constraints to the MILP directly.

For a particularly bad $A$, this solution may take exponentially much time. For example, if $A$ is the first $n-1$ rows of an identity matrix, then $r=1$ but $|K\cap H|=2^{n-1}$, so the outer loop may take exponentially many steps before terminating. Moreover, the MILP might take exponentially long to identify a single feasible solution.

In practice (i.e. for a random $A$), I imagine that the $n$ solutions mentioned above are almost certainly the only solutions, so we probably only need to solve a single MILP (which will prove that there are no more feasible solutions). The practical question then becomes how long this MILP takes. I think we would just need to try it to find out. In my experience, CPLEX tends to be much faster than GLPK at solving MILPs, but I don't have a license handy to try it out.

There are a few optimizations that might be worth mentioning.

- We could modify the MCMC algorithm so that with probability 1/2 it proceeds as above, but with probability 1/2 it reuses the same $A$ and just resamples the last row from $K\cap H$. Since $K\cap H$ has already been enumerated, and since these states are all equally likely, this second possibility takes essentially no compute time and MCMC always accepts. The net effect is much faster "local" mixing but longer scale dependence on $A$. Whether that is a good idea or not depends on what the samples are being used for.
- Similarly, with a certain probability we can permute the rows, columns, or transpose the matrix (or some combination of all three).

Finally, it might be worth mentioning an alternate approach. We might call the previous suggestion the "kernel/MILP" approach, and the following suggestion the "cokernel/dynamic program" approach. (I'm not sure if I should break this off as a separate post, but the thought of providing 3 separate "answers" to this question made me feel sheepish.)

Note that if the matrix $M$ is singular, then there exists some row that is a linear combination of the other rows; therefore, there is some row vector $v\in R^n$ such that $vA=0$. I believe that in the preponderance of cases, $v$ is extremely sparse and has small entries. For example, if a row is all zeros, then $v$ has weight 1 and coefficient 1; if two rows are co-linear, then $v$ has weight 2 and coefficients $1$ and $-1$. (By "weight", I mean the number of non-zero entries in $V$.)

Suppose that we make a proposal for $v\in Z^n$, where we (strongly) bias our distribution to favor sparse vectors with small coefficients. For all the zero entries in $v$, we choose the entries of the corresponding rows of $M$ as uniformly Bernoulli samples. For the remaining rows, we can compute all valid sets of 0/1 entries such that the dot product with $v$ is zero. This involves solving the subset sum problem. This is NP-complete, but it is pseudo-polynomial-time solvable (with a dynamic program). We have "stacked the deck" in our favor by sampling from vectors with small entries, so this step will be efficient (that is, we can achieve polynomial expected work).

If $v$ has weight $k$, the solution to the dynamic program tells us the set $S$ of valid $k$-tuples for each column; the number of consistent solutions is then $S^n$. We can then perform an MCMC random walk with acceptance probability proportional to $|S|$ (eliding details). I imagine that the dynamic program would be vastly faster than the MILP in practice (in addition to being more theoretically tractable), so an approach along these lines would probably be much faster.

By the way, it is possible to bound the maximum possible value for an entry of $v$. This is essentially equivalent to Hadamard's maximum determinant problem, but unfortunately the bound is $(n+1)^{(n+1)/2}2^{-n}$ (so the corresponding dynamic program would be exponentially large.)

That said, I have gotten stuck with some technical issues when trying to construct the MCMC, so I presented the "kernel/MILP" approach above.

4more comments