Summary of the project

In the last few years, the problem of fake content and disinformation spread worldwide and across Europe has dramatically increased (especially on social media). Even if there is a large body of research, there are countries that still lag (e.g. have few or no fact-checkers, no tools and resources. This is particularly challenging when detecting deep fakes. Although there is some research on fake news detection for low-resourced languages (e.g. Romanian, Bangla, Tagalog), there are no guidelines on how to solve this problem for a new language. Also, even if disinformation is defined as an intentional spread of fake information, it is currently addressed by considering only its fakeness and harm, but not its intent. This is technically wrong and can prevent distinguishing between misinformation and disinformation.

TRACES is addressing these problems, by finding solutions and developing new methods for disinformation detection in low-resourced languages. The innovativeness of TRACES is in detecting both human and deep fakes disinformation, recognizing disinformation by its intent, the interdisciplinary mix of solutions, and creating a package of methods, datasets, and guidelines for creating such methods and resources for other low-resourced languages. The Use Case of TRACES is Bulgarian, the national language of a European Union (EU) country with a very low level of media literacy, problematic media freedom and a geopolitically strategic position at the border of the EU. Bulgaria has a high number of self-taught advanced computer hackers, which makes it highly plausible (but not researched) the existence of deepfakes in Bulgarian social media. Detecting and signaling fake content and especially disinformation is thus a critical need for Bulgaria, but there is only one independent fact-checker and very little NLP research on the topic. Another challenge is that Bulgarian is a low-resourced language, with very few NLP tools and datasets, and almost none for processing social media texts.

The proposed research is very well aligned with AI4media Open Call 1, Use Case 1 and Challenge C4-Rt: 1) It fills a significant gap in the EU’s existing AI research and technologies on disinformation detection; 2) It applies AI methods and tools to support journalists and fact-checking experts in digital content verification and disinformation detection; 3) Its results can be integrated into AI4Media, Truly Media and TruthNest; and 4) it is in line with EU’s fundamental rights and all relevant EU regulations (the EU Charter of Fundamental Rights, GDPR, the AI ACT, the Digital Services Act, and the EU Code of Practice on Disinformation) – no personal information will be collected, nor any content removed. The outcomes of the project will provide crucial insights for other EU countries and will consist of language resources, annotated disinformation datasets, machine learning methods, a tool for potential integration into Truly Media to support the discovery of disinformation by journalists and new language adaptation guidelines.

see website