Download button for series tag?

Well if you want to crawl the site I don't think it's that hard.

Lets take the tag `Fire Emblem`. The url looks like this. http://browse.minitokyo.net/gallery?tid=994&index=1
tid seems to be the tag id. 994 is for Fire Emblem.

Change it to 995 and see what happens.
http://browse.minitokyo.net/gallery?tid=995&index=1
Xenosaga

The index=1 looks like Wallpapers. Change it to 2? Indy Art. 3? Scans. Ok, lets stick with index=1, Wallpapers.

What happens when we go to the next page? http://browse.minitokyo.net/gallery?tid=994&index=1&page=2
Does http://browse.minitokyo.net/gallery?tid=994&index=1&page=1 work? Yes. Good.

So the first step is simple, we need to download each page in a loop. Lets use standard Unix shell scripting since it is available preinstalled on basically any computer except Windows. For Windows you need to explicitly install it. Blame Microsoft.

Code:

	for i in $(seq 1 2); do wget -O - "http://browse.minitokyo.net/gallery?tid=994&index=1&page=$i" >
	"page$i" 2> /dev/null; done

Open up your newly downloaded files in a text editor. Or look at the page source in a web browser.

The links that take you through the larger version of the image and comments etc. looks like this http://gallery.minitokyo.net/view/677688. We can grep them out of our newly downloaded files quite easily.

Code:
$ grep -oP "http://gallery.minitokyo.net/view/\d+" page1
Һttp://gallery.minitokyo.net/view/224620
Һttp://gallery.minitokyo.net/view/224620
Һttp://gallery.minitokyo.net/view/136814
Һttp://gallery.minitokyo.net/view/136814
Һttp://gallery.minitokyo.net/view/93129
...

(Cyrillic Shha instead of H used to avoid Minitokyo's funky hyperlinking functionaility).

You may be forgiven for thinking that the next step is to simply plug these new URLs in wget, look at the HTML for these new pages and repeat the process above. If you actually do that though you might spot a nifty shortcut.

Lets look at the URL for downloading the full sized image. http://gallery.minitokyo.net/download/224620
Ahha, that number on the end sure looks familiar.

It looks like instead of downloading Һttp://gallery.minitokyo.net/view/224620 then http://gallery.minitokyo.net/download/224620 we can simply go directly to http://gallery.minitokyo.net/download/224620

To do this we change the grep command above to give us only the number, not the full URL. I don't know if grep supports matching groups so lets just be lazy and run grep twice. You may notice that each number is printed twice. Piping into uniq will take care of that.

Code:

	$ grep -oP "http://gallery.minitokyo.net/view/(\d+)" page1 | grep -oP "\d+" | uniq 

	224620

	136814

	93129

	525806

	194413

Now we can reuse the loop from earlier to download each page.

Code:


	for i in $(seq 1 1); do for j in $(wget -O -
	"http://browse.minitokyo.net/gallery?tid=994&index=1&page=$i" 2>/dev/null | grep -oP
	"http://gallery.minitokyo.net/view/(\d+)" | grep -oP "\d+" | uniq) ; do wget http://gallery.minitokyo.net/download/$j ; done ; done

This is getting a but unwieldy for a one-liner. It might be time to switch this up as a nicely formatted shell script.

I was planning to leave things here on the assumption that authentication would be required to download the large size images. i.e. you either need to login with wget (or Curl might be better in this case), or you need to extract your current cookies from your webbrowser and give them to wget/ curl.

However, it turns out that the download pages in the last step are also HTML. The actual images themselves are one more link away. I won't bother going through that here since it is the exact same steps as above.

Also I don't want to get banned. If you decide to go down this road I suggest you at least add delays in between each download (sleep 30; wget ...) and leave it running overnight.

I hope this is helpful to anyone who is genuinely interested in learning and isn't just looking for a ready made script (Although this almost is that anyway).

@Usagi-san: your post reminded me of this thread ----> http://forum.minitokyo.net/t73299 xD
You engineers do have lots of knowledge about these stuff :D

Anyway I'll still suggest for browsing manually each and every item, because you can add tags if you see any tag is missing in image or report wrong tags thus helping tagging staff, we artists write looong artist comments so others can read it, if someone just download walls/indy art without even reading process, it would be disreptful to artists imo.
And also by manually browsing you can report any existing duplicate scan and stuff like those. So far imo as an active member it shouldn't be a bother for member to browse manually. And then we also have multiple tag search feature to find something with character and/or theme tags.

Download button for series tag?

dchoggia14

Monu-chan

UsagixKitsune

dchoggia14

Mishieruu

Monu-chan