Unfortunately this is cutting edge stuff, it goes a bit deeper than what is in the public domain.
But technically you can understand perceived information density visually in a simplified way. Imagine that the frequency stream of each sound source in a mix is represented by an average frequency locked into a state (the time dimension is reduced from the equation) and that this average frequency state you can plot as a fixed set of pixels and range of colors on each pixel within the total amount of available pixels within the canvas. Imagine that inside of this canvas you place additional sound sources that either fill out the available pixels or that start to override/overlap/share pixels within the canvas. Furthermore imagine that each sound source in the mix has room in the canvas without compromise when no other sound sources are in the canvas but that each pixel is limited in the colors it can express. The perceived information density is at its root the combination of the total amount of pixels available within a canvas, combined with the total amount of colors that can be expressed by each pixel combined with how efficiently the pixels are used to express the combinations of the sound sources within the canvas. When you reduce the bit depth it is like reducing the total amount of pixels available within the canvas, the dithering is kind of an approach of trying to maintain the efficiency of now using the less amount of pixels and colors available most efficiently. The perceived information density is hence also now how efficient this pixel re-distribution and pixel color translation is in order to input the information to the brain of the listener so that as much of the information about the original sound sources as possible within the mix are being perceived.
This is over simplified since music is not single state, but it gives you an idea of roughly what we are discussing. Information density is essentially the total amount of pixels available combined with the total amount of colors each pixel can express. In terms of audio it is essentially how precise the audio is expressed in its combination of frequency and amplitude, e.g., 346 Hz vs 346.382947893284789327483274983274892378374932 Hz @ 3.4 volts vs 3.3948029840298349029384932 volts. The information density is infinite, but in practice is limited within the density of the listener/engineer. Density is kind of a filter sitting on top of the absolute true information, so it means that as the information density increases more of the 100% true information about the source is shining through the filter.