This is essentially unchanged from 1997 Formats. The only difference is that the identifier, Picture, Cartoon, etc. is now enclosed in TN symbols. Look at Example 6-3 in Formats. That shows a picture with a caption and an accompanying description. Cover up the caption and pretend there was just the picture, the situation you describe. You would simply have a TN in 7-5 that says something like Picture shows a referee separating ... whatever. The purpose of the TN is to inform the reader that these words are NOT in the print, but provided by the transcriber.
When there is no caption, incorporate the identifier into the body of the TN by saying something like (TN)Picture shows the girl with the balloon(TN). That tells the reader there is no caption because ALL of the words are in the TN.
When there is a caption, the identifier ONLY is in the TN to inform the reader that that identifier is not in the print, but provided by the transcriber.