Minute-long gravitational-wave transients are an interesting class of events with a high potential for new Science. Comparatively to recent detections of gravitational waves produced from compact binary systems, minute-long transients are expected to originate from a wide range of poorly understood astrophysical phenomena such as magnetars and accretion disk instabilities for which the lack of readily available and accurate gravitational-wave emission models prevents the use of matched filtering methods. Such events are thus probed through an excess of power in time-frequency space correlated between the network of detectors. The problem can be viewed as a search for high-value clustered pixels within an image which has been generally tackled by deep learning algorithms such as convolutional neural networks (CNNs). In this work, we use a CNN as an anomaly detection tool for minute-long gravitational-wave transients. We show that it can reach pixel-to-pixel detection despite training with minimal assumptions while being able to retrieve both astrophysical signals and noise transients originating from instrumental coupling within the detectors. We also note that the neural network can extrapolate and connect partially disjoined signal tracks in the time-frequency plane.