
A method for training an object detection system includes estimating a location of a first object in an environment based on a density cluster map generated from a plurality of images of the environment. The method also includes generating one or more negative training samples of the first object in the environment based on the plurality of images, each of the one or more negative training samples corresponding to a second object at a location in the environment that is different than the estimated location of the first object. The method further includes generating positive training samples from a set of images of the first object. The method also includes training the object detection system to detect the first object based on the positive training samples and the negative training sample.