yatzr scraper util v1.2 - way faster!! Topic

Update v1.2
Fixed a few bugs
Bowl game status now shows up
Separated out wins/losses into separate columns for the coach scraper

Update v1.1
So I had an epiphany earlier today and figured out how to make this thing run way faster.  Multi-threading!  There is now a dropdown to select the number of threads you want the program to use.  More threads means more simultaneous page requests, but it will also take up more of your computer's resources.  You'll have to test a few values to see what number of threads works well for you.  My computer (dual core) seems to fly when I set it to 20, but it actually gets a little slower when I set it to 24.  If anyone is rocking a quad core, they could probably handle 24 just fine.  I can now grab the entire coach pull for a world in about a minute.  I haven't done time trials for comparison, but I'm pretty sure that's over 10 times faster!

I've updated the link in the post.  Grab the new version and experience the speed.  And yes, I will be working on using multi-threading in my other tools now


Original Post:
I had made a few command line scraping tools awhile back for bhazlewood.  There's been some recent interest in them, but the fact that they're command line only has been a barrier for some.  So I packaged them up into a GUI.  You can open this one by double clicking on it (just like all my other tools).  In order for it to work, you HAVE to have the permTeams.dat file in the same folder as it.  This is the same permTeams.dat file that's used for the recruiting tool.

Right now, the scraper utility only includes my roster scraper (gets team rosters, form IQs, GPAs, and season stats) and my coach scraper (gets pretty much everything off the history page for each school).  There's a tab for a game scraper which I'll be including later on.

Program:
http://www.gdreports.com/tools/yatzr/yatzr_scraper_util_1.2.jar
Necessary file that goes with it (you may already have this file if you use the recruiting tool):
http://www.gdreports.com/tools/yatzr/permTeams.zip (you'll have to unzip this to get the permTeams.dat file)

As always, please let me know if you find any issues with it.
2/19/2012 3:38 PM (edited)
It opens up and its blank, not sure wut the issue is, help plz lol
2/17/2012 1:46 PM
Are you saying the program is blank or the output file is blank?
The program should say "yatzr's Scraper Util v1.0" at the top.  Then it should have 3 tabs below that say "Player Scraper", "Coach Scraper", and "Game Scraper".  The player scraper and coach scraper tabs should have several items on them.  The game scraper tab should be blank.  What are you seeing?

If your output file is blank, then that means the program wasn't able to read the permTeams.dat file.
2/17/2012 1:57 PM
Thanx, its updating now i see wut went wrong now lol
2/17/2012 2:06 PM
what exactly does this do?  me feelz dum!
2/17/2012 3:18 PM
This post has a rating of , which is below the default threshold.
yatzr is the Jeremy Lin of WIS.
2/18/2012 12:20 AM
I know I just released this last night, but I think this update is worth it.  This thing flies now.
2/18/2012 1:24 AM
Is it possible to put this technique into use for the recruit tool?  This is crazy fast.

edit: I spoke too soon.  It stopped at 408/612 on 24 threads.  I wonder if there is a limit to how big a csv file can be on a Mac.  Next attempt on 20 threads stopped at 482 teams and the attempt after that stopped at 398.  Now it blew right through it in just a few minutes on 4 threads.

I have the newest version MacBook pro and all your other tools work perfectly.  

2/18/2012 2:31 PM (edited)
Hey, thats cool......I'm thinking about 10 times faster (at minimum).
2/18/2012 3:49 PM
I used 24 threads for players and it took about 4 minutes to do all 612 teams....smoking fast yatz
2/18/2012 4:01 PM
coach took about 2 1/2 minutes set at 24 threads
2/18/2012 4:02 PM

Season W-L/Home and away are coming up in date format however in Excel 2010

2/18/2012 4:03 PM
Posted by zharkins on 2/18/2012 2:31:00 PM (view original):
Is it possible to put this technique into use for the recruit tool?  This is crazy fast.

edit: I spoke too soon.  It stopped at 408/612 on 24 threads.  I wonder if there is a limit to how big a csv file can be on a Mac.  Next attempt on 20 threads stopped at 482 teams and the attempt after that stopped at 398.  Now it blew right through it in just a few minutes on 4 threads.

I have the newest version MacBook pro and all your other tools work perfectly.  

Yes, I plan on incorporating this technique into the recruiting tool.

The most likely thing that happened was that one of the threads got timed out while doing a page request.  I didn't take the time to handle that case very well, so when one thread times out, it just stops all the threads, throws away what it's done, and you have to start over.  When you use a lot of threads, it's a lot more likely for one of them to get a timeout exception.  I would go ahead and try 8 and 12 threads to see if those work out for you, if you haven't already done so.
2/18/2012 9:41 PM
Posted by dukelegend on 2/18/2012 4:03:00 PM (view original):

Season W-L/Home and away are coming up in date format however in Excel 2010

When you open it in excel, it should pop up a thing that lets you define what type of column each column is.  It's probably guessing that those are date columns, but you should be able to say they're just text columns.  I think I might just change it to include separate win and loss columns for each of those.
2/18/2012 9:42 PM
12 Next ▸
yatzr scraper util v1.2 - way faster!! Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2024 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.