Quote by Shimazakidont know
if the System can automaticly see Scans i try to upload and know that they were deleted once or not in the past from
other Members ( I doubt that MT got a AI like that) so i would like to know what iam dealling with
If
you have ever been to a site like Danbooru you might have noticed the weird file names which look like this
"13167de4acac50e12872eab5fbbc7bfc.jpg".
The long 32 digit number (yes A-F are
numbers) is called the checksum of the file.
This number is calculated based on the content of the file and is unique to that file. This is how websites can keep
track of which files have already been uploaded and deleted without keeping a copy of every file and doing pixel by
pixel comparisons of the images. It is also the starting point for forensic investigators looking for known illegal
files or malware on a computer.
A major weakness of traditional checksum algorithms such as md5 (what Danbooru and friends use) is that the slightest
change in the file causes a completely different checksum to be generated.
Notice I said file not image. Simply appending a null byte on the end of the file will allow you to re-upload a banned
file on most websites and in the case of images at least it will not change the meaning of the content (you won't
see anything different about the picture).
Quote by shell$ md5sum a.png
703b52ddba099df2806406309d931184 a.png
$ truncate -s +1 a.png
$ md5sum a.png
bd5d2a80efe4c31bff5e6ba507a72606 a.png
There is such a thing as fuzzy hashing algorithms which produce output more closely matching patterns inside files so a
small change like appending a null byte will return a very similar checksum to the original which is much better at
detecting files with only slight variations. In reality most websites stick with algorithms like md5 because they are
much faster and that makes it cheaper to run the servers.
An interesting side note is that checksums also have an application in the security space since they can be used to help
prove that a file came from a trusted source and has not been tampered with since it should be impossible to generate 2
files with the same checksums. You may remember the Flame malware that hit the news a few
years ago. Part of the reason this was such a big deal was because they used an md5 collision to hijack Windows update
and infect victims. Before that point md5 collisions were only theoretically possible so it was a pretty badass attack.
Don't worry, Microsoft have obviously upgraded the Windows update mechanism since then.