June 26, New York (Reuters) – The first trade association for the industry was established by seven content-licensing vendors of images, videos, music, and other datasets for use in AI system training, they announced on Wednesday.
According to a statement from the companies, the Dataset Providers Alliance (DPA) will support “ethical data sourcing” in AI system training, which includes protecting the intellectual property rights of content owners and upholding the rights of people portrayed in datasets.
Large amounts of content, most of which were free-scrapped from the internet without the owners’ permission or consent, have been fed to models by developers to train them.
Tech companies, who maintain that the use is lawful, are also covertly funding access to private content collections to meet specific data requirements and protect themselves from legal and regulatory ramifications.
A new industry of businesses that package content and sell access to it for use by AI systems has emerged due to the expectation that demand for licenced data will increase if copyright owners win their legal battles.
Consequently, organizations have emerged to set moral guidelines for that industry. One such organisation is Fairly Trained, a nonprofit established this year that certifies models who have not used copyrighted materials without a licence.