Overnight stuff Topic

This post has a rating of , which is below the default threshold.
Portents of doom and destruction!
9/9/2010 2:08 PM
Posted by _hannibal_ on 9/9/2010 1:31:00 PM (view original):
Posted by ryrun on 9/9/2010 1:05:00 PM (view original):
Posted by davis on 9/9/2010 11:59:00 AM (view original):
I would assume that it is difficult to figure out, because that is true of most software/hardware problems that "randomly" pop up.  If it happened every night, it'd be easy to figure out.  The random ones are always the hardest to pin down, in my experience.

I understand the frustration, and in a way it is good - it shows how emotionally invested people are in their teams.  I just think that these problems are harder to solve than is generally acknowledged, and that people can be pretty unreasonable in terms of what they think they are entitled to in exchange for $10 or $12.
Nothing is "random" when you're talking about programming.  If you write the code, it'll execute the code... there isn't a lot of gray area.  Especially when you start talking about an error that is causing the entire process to either hang or terminate altogether - there should be plenty of error handling in place to either stop that from happening (and just skipping over the problematic game(s), then flagging them to be simmed later) or to send a detailed log of what in the world happened to bring the entire process to a halt.

Now if they are just overworking their servers or something along those lines, then they can just check their error/performance logs and come up with a plan for reducing the stress. 

Regardless, it really isn't that difficult to find and correct an issue that has been happening "for a while", especially one that happens two nights in a row... that should give you all of the info you need to figure out the cause and then you can get to work on the solution immediately.
If the problem is something causing the process to hang, how do you propose skipping over the problematic items?  You don't know they are problems until you run them and at that point, the process is hung.  Sure, you could start writing code to monitor the process from an external process and try to detect when it hangs and kill it; but that is not an insignificant piece of code in itself.  So is your time better spent tracking down the issue or writing complex work-arounds?  Is it cost effective to do either as opposed to having the process monitored manually?

Keep in mind that the process likely involves at least two machines, the application and the database server.  If more machines are involved, the complexity is upped even further.  The issue could be with the code, or it could be environmental.  Maybe the temp space on the machines is filling up causing a reboot.  Maybe the transaction is open too long causing database to run low on memory or rollback log space.  If the problem is in code tracking it down is more difficult because of the heavy use of random numbers in the program.  Run it with one set of random numbers, everything's fine.  Try another, it goes boom.

You should never just let a process hang - there are many options in place to prevent that and just cause an exit due to timeout.  You can also write a process that would monitor memory space and clear it out when it hits a certain threshold, cut down on the number of commands in each transaction (or maybe they just need to write their queries in a more effective manner or have a better structure to their DB), etc. 

Whatever it is, it is fixable and it shouldn't take weeks to track down.  I would hope they would be able to duplicate an exact run from a certain date with database backups, so if they have two days' worth of issues, it should be fairly simple from a troubleshooting standpoint to see A) where the process hung and B) why it hung.  If they don't have that mechanism in place, then I'd say they need to get on it - because as you said, everything is random, so they need something in place that will allow them to control the numbers and, if necessary, repeat a run in their test environment.

If this has been happening for a while and they still haven't tracked it down, then I'd love for them to instead spend their time on workarounds and then go back to trying to fix the original problem.  Anything to stop the entire process from just stopping dead in its tracks.
9/9/2010 2:08 PM
Posted by tkimble on 9/9/2010 9:02:00 AM (view original):
In Knight we are recruiting and...2am cycle ran successfully, but then (Edit: supposed to be NO, not now) FSS at 3.  Then it appears no 5 am cycle ran and now stuff I just put in 15 minutes ago for the 8AM cycle (now 8:07) still shows up as pending.  Oh, and Tark still hasn't simmed so that's great too.
when seble first announced that games would run before recruiting, and everything would be on a set schedule, i thought that was a terrible idea, to let recruiting times being dependent on the games sim time. even pushing to 2:30 with a 3 hour window is too long, IMO. but most people seemed to disagree, saying it only mattered twice a day and wanted the game results quicker.

well, this is the cost of removing the simplicity of running everything on its own. this didn't used to happen nearly as often. and when games didn't sim, recruiting didn't break too. skipping recruiting cycles is really bad. i firmly believe we need to go back to the old way, keep it simple, and try to minimize the collateral damage when one thing breaks. 

to me, more important than fixing the logic problems, is fixing the reliability of the product. not long ago, worlds ran off season improvement several times AND COULD NOT RECOVER IT! the data needs to be backed up, and problem recovery needs to improve. got to build a solid base before doing the fancy stuff, i think! 
9/9/2010 2:25 PM
Posted by seble on 9/9/2010 9:35:00 AM (view original):
Hey guys, sorry again.  There is a random issue that has been popping up for a while now and it's hit the last 2 nights.  If I knew what the problem was I'd fix it, but all we can do is try to figure out what's going on. 
i disagree that all you can do is try t o figure out what's going on.

from my experience, with problems like this, there are always a few angles you can attack them from -
1) try to figure out what's going on and fix it
2) if you don't know what is going on, try to minimize the impact
3) if you don't know what is going on, try to improve detection of and response to the problem

in particular, i am in favor of #2. recruiting times absolutely should not rely on games being simulated. that is far too costly a mistake. having everything running in order is fine as long as step N failing doesn't cause steps N+1 on to be skipped. it makes mistakes and problems have a much larger footprint than they should. *some* things have to rely on others, i understand that, but recruiting and game sims absolutely do not - right? considering the two are mutually exclusive in 1 world...

also, #3. for example, if you have a world in the post season, or maybe any world, you need to seriously consider bumping back the next sim time. if you sim at 8am, the 2pm should be pushed back and announcements should be made of the upcoming schedule, so people can plan.

i hope you can see how the problem here is not so much the problem itself. its everything else - allowing the effect of that problem to snowball into something much greater, and not making adjustments to ease the impact on the users. im not sure how you could react to the recruiting botches, but i would seriously think about it, and be soliciting feedback *before* i kicked it back off.
9/9/2010 2:38 PM
Coach billy - i thought seble changed it back so that recruiting is run first.  I thought that is where we left it.  Am I mistaken?
9/9/2010 2:41 PM
oh really? i guess i missed that. i hope you are right, that would be a lot better. i take back half of what i said then :)

edit: well, maybe not. i guess it depends on the specifics. if the game sim breaks and that breaks the next recruiting cycle, then most of what i said still applies.

9/9/2010 2:48 PM (edited)
I know I argued for recruiting to happen first.  Like you said there is a much smaller window to recruit than there is for games.
9/9/2010 2:48 PM
I thought he changed it back, too.
9/9/2010 3:18 PM
That's what I am saying, that in this double switch of sim and recruiting, something ran amiss.
9/9/2010 3:31 PM
Posted by davis on 9/9/2010 11:25:00 AM (view original):
Posted by addyyy on 9/9/2010 9:02:00 AM (view original):
Would love to see a organizational chart for this site so we know exactly whats going on.  How many people work there?  Are there backups for each position?
A list of contacts and job functions would be nice....
Do you request an organizational chart for McDonald's if your burger takes extra time?  Do you request an organizational chart from Delta Airlines when your flight runs late?  How about an organizational chart for Kroger's when you get a stale loaf of bread?

I'm not sure why the customers of this site view it so differently than they view other businesses.
No-- I dont request a OC from McDonalds because I can speak to someone in person if I have a complaint at the store or I can contact corp through a number of avenues..The same holds true for your other "examples".   It seems to me that the frustration most people have is in the lack of communication or place to see a status update.   Updates should be issued as they occur and not issued until people complain enough.  In the AM when checking the overnights scores and results etc, if the program didnt run-- the status area should say that there was a problem with the overnight sim and we are aware of the problem and should be corrected by x time, etc. 
And yes it is the way most of us view other businesses.
9/9/2010 4:36 PM
Posted by davis on 9/9/2010 11:25:00 AM (view original):
Posted by addyyy on 9/9/2010 9:02:00 AM (view original):
Would love to see a organizational chart for this site so we know exactly whats going on.  How many people work there?  Are there backups for each position?
A list of contacts and job functions would be nice....
Do you request an organizational chart for McDonald's if your burger takes extra time?  Do you request an organizational chart from Delta Airlines when your flight runs late?  How about an organizational chart for Kroger's when you get a stale loaf of bread?

I'm not sure why the customers of this site view it so differently than they view other businesses.
yes to all. i have a pile of org charts in my basement of most major companies.
9/9/2010 9:01 PM
Posted by vandydave on 9/9/2010 7:49:00 AM (view original):
they need to get more people flipping the coins...
I actually stole the RNG in the cloak of darkness....teach them to use an in house RNG lol.
9/9/2010 9:09 PM
◂ Prev 1...5|6|7
Overnight stuff Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2026 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.