Reliable sources
From MLDonkey
reliable_sources feature can (somewhat) help you download files from a mix of good sources and bad sources (sending corrupted data).
The problem is since you can only validate a chunk as a whole, if you received data from several sources you can't tell who sent broken data.
Each source gets a score (reliable, neutral or suspicious of level x):
- sources start out as neutral.
- sources that were used for downloading successfully a chunk become reliable_.
- when a corrupted chunk is detected, the non-reliable sources used become suspicious of level n/2, where n is the number of non-reliable sources involved (special case, if only reliable sources were used, they're no longer trusted to be reliable, and all become suspicious of level n/2.)
- no more than m non-reliable sources can be used to download a chunk if there's a suspicious source of level m among them. (the corrolary is that suspicious sources of level 0 is the same as banned: the source cannot be used anymore.)
- while not absolutely necessary for the scoring algorithm, MLdonkey tries to avoid mixing sources of different score kinds (reliable, neutral or suspicious), to discriminate sources faster, and to avoid spoiling downloads from reliable sources.
You can check sources (currently IP addresses) scores with dump_reliability.
That algorithm should work, discriminating bad sources through dichotomy principe, but it may not be fast enough when they're lots of corrupting sources, as many users only keep their IP for 24h or less.
Feel free to suggest improvements.