I see what you mean about not being a translator. The sighted student learns from the picture, not from a dictionary, so my job is to produce an alternate format of the picture rather than a translation of the captions.
I am a little unclear where to put the labels and the TNs. For the first picture, would the following be acceptable:
,'Picture,' poltrona ,'label points to an armchair.,' cappotto ,'label points to the overcoat of a girl sitting in the armchair.,'