good question
it depends on what your intention is
but for the most part you would be shooting the video with the music playing if you are lip-synching to ensure that then you have the words in time with the recorded version
you might even capture the music playback so that you can later synchronise the vision to the soundtrack when editing