I think the stereo image is just the delay between 2 sources. When you split a sound up and delay one of them with somewhere around less than 30ms and pan them left and right, the ear can't really detect it as 2 different events so it just sounds like the sound is split out on the sides.
If you had a perfectly centered sound, essentially you'd get the same result from one speaker right in front of your nose as you would get from 2 perfectly spaced speakers on the side. Without Panning and some delay between the time it takes for the source sound to reach your ear, you won't get the feeling of stereo as the sound would reach both your ears at the same time with the same volume and so you won't detect any difference between them. This is essentially how all things you hear that is "wide" works, be it chorus or stereo imaging plugins.
What mid side does is just isolate the information that is not dead centered or any other value the plugin might use and vice versa so you can process them individually.
Panning would just be either lowering the output to either the left or the right channel as well as increasing the output to the opposite simultaneously by any given value, this will differ depending on different pan laws within daws.
I may be off with this info, am not that confident in my technical knowledge as some, so someone will surely come around and correct me in that case.
Here's a neat article on the subject;
http://www.soundonsound.com/sos/apr12/articles/reaper-0412.htm