Some photographers who contributed photos to the Flickr photo-sharing site were surprised IBM used those same photos in a million-image collection to train AI face-recognition systems -- but perhaps they shouldn't have been.
The photos had been shared under a Creative Commons license, a framework under which people can loosen restrictions on photos, text, video or other material that otherwise would be protected by copyright. CC licenses can bar commercial use or require others using the photos attribute them to their source, but the general idea is to make the work available for others to use.
"None of the people I photographed had any idea their images were being used in this way...It seems a little sketchy that IBM can use these pictures without saying anything to anybody," Greg Peverill-Conti, an executive at public relations firm SharpOrange, told NBC News.
IBM used only photos licensed under Creative Commons, and IBM's legal team approved the program, a company representative said. The data is offered only to academic researchers. called Diversity in Faces. The faces are annotated with human observations about factors like sex and age and with geometric measurements, and they are intended to help researchers counter bias that can undermine AI fairness.
"We take the privacy of individuals very seriously and have taken great care to comply with privacy principles, including limiting the Diversity in Faces dataset to publicly available image annotations and limiting the access of the dataset to verified researchers. Individuals can opt-out of this dataset," spokesman Saswato Das said in a statement. "IBM has been committed to building responsible, fair and trusted technologies for more than a century and believes it is critical to strive for fairness and accuracy in facial recognition."
One lesson here: if you don't want your imagery used to train artificial intelligence systems -- or to appear in books, Wikipedia articles, art projects, and corporate PowerPoint presentations -- choose your Creative Commons licenses carefully or don't use them at all.
More than 700 of Peverill-Conti's photos are in the collection, and some photographers had trouble getting IBM to remove their photos from the data set, the news network said. Peverill-Conti didn't immediately respond to a request for comment.
The Creative Commons organization, a nonprofit that oversees the licenses, didn't comment on IBM's specific usage. But Chief Executive Ryan Merkley said the matter of faces used to train AI systems is broader than just a licensing issue.
"Our tools were built to solve for copyright, and they do that well," Merkley said. "But copyright isn't a good tool to address privacy, or research ethics, or surveillance AI."
Flickr defends IBM's usage
Flickr's leader, SmugMug Chief Executive Don MacAskill, tweeted on Tuesday that IBM retrieved the photos before SmugMug acquired the photo-sharing site, and he defended IBM's type of usage as adhering to the principles of Creative Commons.
"We love & support photographers and their right to choose their own licenses for their work. By default, they reserve all of their rights, and have the option to loosen them if they'd like," MacAskill tweeted.
"People didn't have to opt-in to the dataset because they had already opted into the Creative Commons license. They took action. This is the way licensing works. It's also the magic that enables artists & scientists all over the world to create & invent using CC-licensed works," he added.
Flickr has more than 400 million photos shared under Creative Commons licenses. Although Flickr eliminated a Yahoo-era plan that offered photographers a free terabyte of photo storage, it exempts Creative Commons shots from the limit.
First published March 12, 7:10 p.m. PT.
Update, 8:21 p.m. PT: Adds further comment from IBM.