RESCOMP Archives

December 2009

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Robin, Robin" <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Robin, Robin
Date:
Sun, 20 Dec 2009 01:22:27 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (88 lines)
I'm almost tempted to say. 
* We go with the fixed and hopefully it doesn't panic and use Tivoli to restore files that it currently does not have as a result of the fix.
* That way, the effects of fixing a damaged file system are minimized. Though, you might have some files contents be off.
* Live with the bad directory (which cannot be fixed via fsck).
* If it panics again, we'll schedule a downtime to go with the download of the tape.

If time is no issues, we can test restore from the tape, make sure the tape is good. Rebuild that segment fresh and pull from the tape.

Steve, what's the back up node that it got renamed as if we end up decide to pull ?

Thanks,
Robin


________________________________________
From: Robin [[log in to unmask]]
Sent: Sunday, December 20, 2009 12:22 AM
To: Robin, Robin
Cc: Research Computing Support
Subject: Re: Update on the file system

Well... update on saving current copy of salvaged data from the damaged/fixed segment.
The tool I use to identify which files belong to the damanged/segment (lv04) does not handle spaces.

So, I can correctly back up things that have no spaces identified with lv04.
Those with file/dir name spaces, I would either be forced to back them all up as well (or leave them).

I'll start the backup for those I can correctly identify with damaged/fixed segment.

Robin



On Dec 19, 2009, at 11:50 PM, Robin wrote:

> Couple things that is hard to decide. The file system sustains some damage (majority of users should not notice them).
> There are files that are moved to /ibrixfs1/lost+found. We can look through those files and see who owns it. Most of them I think it's owned by software.
>
> Here are the not easy choices:
> 1) We have a file system fully back up with some damage. Nature of file system checks.
>     it also has a directory that is bad that should not be touched by anyone.
>     We can go with this and see how this goes. There are files that are lost in typical fsck fixes or got moved to /ibrixfs1/lost+found
>     I would suspect most will not notice anything.
>
> 2) We can tear it down and restore it from back up, since we know the back up data is more pristine.
>     The data will be cleaner. To be safe, we can copy the existing live data somewhere.
>
> This will not be an easy call.
> (1) will allow us faster uptime. Less clean data (as a result of heavy file system fixes).
>       That single directory issue is definitely annoying.
>       Do not do anyhing ls -l or du -skh anything that can touch this directory [ /ibrixfs1/home/karroje/research.d/jenna.d/EAST_Test/test_pipeline ].
>       The permission will be 000; so that users cannot do anything.
> (2) will require us to go to backup and that restores about 800 GB of data to ensure pristine data (not data as a result of fsck fix).
>      We'll copy current file system fixed data, just in case the tape is bad.
> (3) run with (1) and use back up tape to fill in missing files (not sure how long it's going to take -- but definitely shorter).
>
> One thing for certain.  The last good backup should be saved and not touched anymore; because the file system checks report damage.
> We definitely have data on the tape that is not on our current file system that is up.
>
> (1) is not as clean and that unfixed directory is annoying. They did not have an engineer onsite now who can potentially fix it deep down using file system representation at this moment.
>      If users suspect their files/directories have issues, they would need to let us know and we'll go to the pristine tape.
>      I suspect most will not notice a thing. But, the lingering feeling about a fixed file system is definitely annoying.
> (2) is cleaner; will take longer.
> (3) we have (1) up and we'll get from tape what we don't have.
>
> We have (1) in place. We can decide if we want to do (2).
>
> If we decide to go with (2), we are backing up (1) data to another server (mulnx32 partially) just in case the tape is bad -- we have restored things in the past. That segment has 800 GB of data, if we have a bad tape, things could be pretty bad. So copying current  copy of (1) before going with (2) is fairly desirable (just in case the tape is bad).
>
> (2) seems cleanest solution (assuming backup tape is fully intact: 800 GB) -- because the file system contains less damage. If we go there, it will be Monday/Tuesday/Wednesday, before it will be done.
>
> The only downside of (2) is time -- but it will be cleaner. (1) has an upside that we are up sooner.
>
> If we are conservative, before choosing (2), we are copying current salvaged data to mulnx32 which happens to have some space.
>
> We need to call and decide. I'm making plans to copy (1) just in case we are going to (2).
>
> Look at the files: ibfsck-seg04.log (attached).
>
> (2) is fairly attractive pristine data (especially if we save the salvaged data, so that in the event of partial bad tape, it will not be bad) -- just timing is not that great.
>
> Vendor is a bit ambivalent about it as well. On one hand, you can have clean data. On the other, we need to get from tape.
>
> Arghh.. decision.
>
> Robin
> <ibfsck-seg04.tar.gz>

ATOM RSS1 RSS2