Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compressed HDD image seems to be messed up
#1
Here's what I did:
  • Created one single image file of an old HDD with multiple NTFS-Partitions using ddrescue. The HDD used to have 20 GB of storage with about 15 GB being used.
  • Compressed the previously created image file using 7z. The 20 GB image file was compressed to about 200 MB (High compression if I think about it). 7z returned no errors. (It also renamed the archive from *.7z.tmp to *.7z, which makes me think it was being successful.)
  • Stored the 7z archive with the HDD image in it to a safe place.
I usually backup my data by making an image and then compressing it. This method has never caused me any trouble.

Later, when I wanted to use the HDD image again, I actually did run into trouble:
  • Decompressed the 7z archive. It returned no errors and re-created my 20 GB image file.
  • I could write the image file to an USB stick without problems.
  • The USB stick now shows me one single unknown partition and it's impossible to mount it. That's where I struggle. Same problem occurs when writing the image to an SD card, external HDD or trying to mount it as loop devices.
I already tried recovering the data but wasn't really successful:
  • Testdisk isn't able to find/recover any partitions on the newly created USB device.
  • Photorec actually recovered few of the files that originally used to be stored on the HDD, but that's only a small fraction of the whole storage.
Even tough the original HDD was a windows-drive, I'm only using debian-based Linux for handling my backups.
It would be cool if anyone knew a solution to get the HDD data back, but my first priority in this post is to find out what I possibly could have done wrong. 

Btw: EzeeLinux is cool.
Reply
#2
How did you create the original image (using dd or something else)? Maybe something updated and changed so that this process no longer works properly.
I am not too experienced in the various imaging software, but one method that never fails me is dd piped through gzip. Search for dd in the Arch Wiki and then look for gzip on the page (writing from a phone and am with friends in a shisha bar, they're using it, I'm just talking, they're in my class). I used that method to clone 12 computers in school.

I hope this has helped you and have a nice day!
Name: Sandy Vujaković
Laptop: Dell Inspiron 3793 (17", i5)
OS: Ubuntu Groovy Gorilla
Reply
#3
(10-04-2018, 01:09 AM)wheel Wrote: Here's what I did:
  • Created one single image file of an old HDD with multiple NTFS-Partitions using ddrescue. The HDD used to have 20 GB of storage with about 15 GB being used.
  • Compressed the previously created image file using 7z. The 20 GB image file was compressed to about 200 MB (High compression if I think about it). 7z returned no errors. (It also renamed the archive from *.7z.tmp to *.7z, which makes me think it was being successful.)
  • Stored the 7z archive with the HDD image in it to a safe place.
I usually backup my data by making an image and then compressing it. This method has never caused me any trouble.

Later, when I wanted to use the HDD image again, I actually did run into trouble:
  • Decompressed the 7z archive. It returned no errors and re-created my 20 GB image file.
  • I could write the image file to an USB stick without problems.
  • The USB stick now shows me one single unknown partition and it's impossible to mount it. That's where I struggle. Same problem occurs when writing the image to an SD card, external HDD or trying to mount it as loop devices.
I already tried recovering the data but wasn't really successful:
  • Testdisk isn't able to find/recover any partitions on the newly created USB device.
  • Photorec actually recovered few of the files that originally used to be stored on the HDD, but that's only a small fraction of the whole storage.
Even tough the original HDD was a windows-drive, I'm only using debian-based Linux for handling my backups.
It would be cool if anyone knew a solution to get the HDD data back, but my first priority in this post is to find out what I possibly could have done wrong. 

Btw: EzeeLinux is cool.

Maybe the compression and later decompression somehow corrupted the disk image,
you said you used high compression, which makes it more likely to damage the data.
I recommend you also store a hash value of your disk-images in the future,
so you know if the compression and later decompression corrupted any data
as well as not using high compression.
I recomnned the .tar.gz format for that:
Code:
tar  zcf  disk-image.tar.gz  disk-image.iso
And I also believe ddrescue is not the right tool for your purpose:
It is designed to get data of failing drives, not to make backup images of your drives; Use regular 'dd' instead.
It is entirely possible that the disk-image was already unusable before compression:
ddrescue ignores read errors (it is a rescue tool after all) while regular dd either aborts or tries to read the sector again.
Reply
#4
Thank you for your replies. I think I already learned something.

Quote:I am not too experienced in the various imaging software, but one method that never fails me is dd piped through gzip.

Even though I never thought of that, it indeed seems to be very easy to do that by using something like:
Code:
dd if=/dev/sda1 | gzip > ~/image-compress_sda1.img.gz
I'll definitely will give this a shot next time I create an HDD image. (But this time I'll decompress and test it before carrying on.)


Quote:Maybe the compression and later decompression somehow corrupted the disk image,

you said you used high compression, which makes it more likely to damage the data.
Yes, that probably has been the reason. I wasn't aware that compressing could actually cause a serious damage. Does this happen frequently? (might sound like a stupid question, but I'm serious  Blush )

Quote:I recommend you also store a hash value of your disk-images in the future,
so you know if the compression and later decompression corrupted any data
as well as not using high compression.

How would I do this in practice? Is it okay to use
Code:
sha256sum image.img > image.sha256
before compressing and then compare the file to the sha256-value of the later decompressed file? Is it possible to get a hash value of a file without decompressing it? Is sha256 the right thing to use here?

Quote:And I also believe ddrescue is not the right tool for your purpose:
It is designed to get data of failing drives, not to make backup images of your drives; Use regular 'dd' instead.
It is entirely possible that the disk-image was already unusable before compression:
ddrescue ignores read errors (it is a rescue tool after all) while regular dd either aborts or tries to read the sector again.
Thanks for letting me know  Rolleyes. In the future, I will only use ddrescue if dd fails.
Reply
#5
Another issue is bit rot.

A true backup are redundant copies on multiple sources while a one off "backup" is just a copy and not a backup at all.

The more compressed you make a file the higher the chance of failure as just one bit flip and you are out of luck.  That is unless you keep multiple copies on multiple media.  Also hardware RAID doesn't protect against most common data failures.  It is honestly false data security.

The safest file system in the world is ZFS period as it protects against so many data failures.

To generate a hash just do the following:


Code:
shell> sha256sum filename

Example:

Code:
shell> sha256sum backup.img

Then note the hash in a file.  When you go to extract the file you may check the hash.  Still that doesn't tell you if the image was done correctly, but it will tell you if bit rot occurred.
Jeremy (Mr. Server)

* Desktop: Ubuntu MATE
* Windows are for your walls, Apple is for your health, Linux is for your computer
Reply
#6
(10-04-2018, 07:28 PM)cleverwise Wrote: Another issue is bit rot.

A true backup are redundant copies on multiple sources while a one off "backup" is just a copy and not a backup at all.

The more compressed you make a file the higher the chance of failure as just one bit flip and you are out of luck.  That is unless you keep multiple copies on multiple media.  Also hardware RAID doesn't protect against most common data failures.  It is honestly false data security.

I see.
Indeed, I happened to have an external drive with a backup of my (Raid 1  Big Grin ) backup HDD. I've tried to recover my compressed image from this drive and it still failed. So I guess it's time to give this one up.

Anyways, my questions were answered and I've learned something. In the future I'll hopefully act a little smarter. Thank you a lot.
Reply
#7
(10-05-2018, 12:55 PM)wheel Wrote:
(10-04-2018, 07:28 PM)cleverwise Wrote: Another issue is bit rot.

A true backup are redundant copies on multiple sources while a one off "backup" is just a copy and not a backup at all.

The more compressed you make a file the higher the chance of failure as just one bit flip and you are out of luck.  That is unless you keep multiple copies on multiple media.  Also hardware RAID doesn't protect against most common data failures.  It is honestly false data security.

I see.
Indeed, I happened to have an external drive with a backup of my (Raid 1  Big Grin ) backup HDD. I've tried to recover my compressed image from this drive and it still failed. So I guess it's time to give this one up.

Anyways, my questions were answered and I've learned something. In the future I'll hopefully act a little smarter. Thank you a lot.

I understand.

Yeah RAID (software or hardware) will happily serve bad data.  It doesn't checksum anything and nor will it compare the file contents.  So you can have two copies of data in RAID 1 (or RAID 5) and there could be a good file and a bad file.  However RAID doesn't know and can serve either the good or bad file.

RAID is actually very dumb in the sense it has no intelligence.  It doesn't understand the file structure.  It doesn't understand the files.  It doesn't protect against bit rot.  It doesn't protect against accidental changes.  Heck it often doesn't even help with hardware failure.  I have seen many RAID systems report all is well when in fact it isn't okay at all.  Plus RAID assumes everything was written correctly to disk.

I have little faith in RAID and find people feel their data is far safer than it is really is.  RAID is a false sense of data integrity period.

The only true way to protect your data is in a system like ZFS.  That filesystem checksums everything.  It knows if a file is bad or good based on the checksum.  If it has two copies of a file it will automatically detect a bad checksum and then return the good file.  ZFS will then replace the bad file using a good file to restore the error.  ZFS will verify a file was written correctly to disk.  ZFS is copy-on-write which is far safer.  RAID doesn't do any of this at all.

ZFS does even more but that is a basic run down.  RAID is overhyped.
Jeremy (Mr. Server)

* Desktop: Ubuntu MATE
* Windows are for your walls, Apple is for your health, Linux is for your computer
Reply
#8
Quote:Yeah RAID (software or hardware) will happily serve bad data.  It doesn't checksum anything and nor will it compare the file contents.  So you can have two copies of data in RAID 1 (or RAID 5) and there could be a good file and a bad file.  However RAID doesn't know and can serve either the good or bad file.

RAID is actually very dumb in the sense it has no intelligence.  It doesn't understand the file structure.  It doesn't understand the files.  It doesn't protect against bit rot.  It doesn't protect against accidental changes.  Heck it often doesn't even help with hardware failure.  I have seen many RAID systems report all is well when in fact it isn't okay at all.  Plus RAID assumes everything was written correctly to disk.

I have little faith in RAID and find people feel their data is far safer than it is really is.  RAID is a false sense of data integrity period.

The only true way to protect your data is in a system like ZFS.  That filesystem checksums everything.  It knows if a file is bad or good based on the checksum.  If it has two copies of a file it will automatically detect a bad checksum and then return the good file.  ZFS will then replace the bad file using a good file to restore the error.  ZFS will verify a file was written correctly to disk.  ZFS is copy-on-write which is far safer.  RAID doesn't do any of this at all.

ZFS does even more but that is a basic run down.  RAID is overhyped.

Okay. I'm in. That's something interesting to talk about. Since I'm not really familiar with ZFS, I'm just telling you what's in my mind right now:

What I need is a NAS for home and office usage. Speed is not the highest priority here, but I want my data to be encrypted and somehow redundant.

Here's something about my current setup: I do use a small HP Microserver with a (Hardware-)RAID 1 filesystem (for personal data) and another HDD (for public data that's not to hard to recover and temporary work). The server OS is on a fourth drive, a SSD. On a regular basis I'll take a backup of my RAID 1 filesystem to an external HDD. For the time this external HDD is not connected to my server, it'll be stored in another physical location.
I started using RAID 1 about ten years ago after experiencing some trouble with an HDD that physically failed. It just became totally unreadable all of the sudden. I never expected anything else from RAID 1 than protection against physical damage. It is actually really nice to hear that it is actually possible to do more.

Here's what I was planning anyways for the long-term: In order to reduce costs for energy consumption (living in Europe, electricity prices are high) , I was planning to use a Raspberry Pi or comparable single board computer and connect multiple SSD's or even USB sticks externally to it. (This plan will not be realized as long as a 4 TB HDD costs almost 1000 bucks and/or USB sticks are not giving you the ability to store multiple terabytes of data.)

I started reading about ZFS and figured out that when using it on Linux (I was reading the Ubuntu wiki), it would use up about 1 GB of RAM for each TB of storage space and also some CPU power. Since an encrypted filesystem would already eat up some resources, I'd probably need a decent amount of RAM and a strong CPU in my future server. This doesn't really support my plan as far as reducing energy consumption goes.
Where is a good starting point for me if I want to get familiar with ZFS (or something similar)? Is it true that this system is so hungry for resources?
Is there anything else (affordable Blush ) to use in order to increase my data security?

Side Note regarding energy consumption: No, I'm not buying a new server just to save some bucks in the electricity bill. I'd rather plan my next server (that has to be bought anyways) to be more efficient. (Just saying to avoid possible discussion)
Reply
#9
That 1 GB per TB has been so over blown and is nonsense.  While it never hurts to add more RAM that 1 GB per TB is more necessary only if you use deduplication.  I don't recommend using deduplication.

If you really want to save power down the road then something like a FreeNAS mini would be one of the better options.  Those are ultra lower power and well built.

Also if you want appliance like management look into FreeNAS.  That open source OS is designed to make running a NAS easy and is based on ZFS.  However it doesn't work on a Pi.

You really don't want a multi TB file system running on only 1 GB even if it is Linux.  Sorry but a Pi is a poor choice for a NAS.

If you want to use Linux then the easier method to get started is use Ubuntu.  Install it using the standard EXT4 and then add data drives that are formatted with ZFS to store your data.  Why not run the whole system as ZFS?

Well it isn't hard to get ZFS on root for Linux but it is more effort.  You mainly need ZFS on the storage drives as the OS is easily reloadable.  Still running ZFS on the whole stack is nice it just takes more work.  So it is up to you.

One way to learn ZFS is simply install Ubuntu in VirtualBox, VMWare, etc.  Then add additional drives.  They can be small like 500MBs if you just want to learn.

ZFS has a lot of advanced features.

1) You may put the ZIL and ARC caches on different drives.  This is something RAID can't do and neither can most filesystems.

2) ZFS makes snapshots painless and almost disk free.  RAID can't do this.

3) ZFS is easily movable.  RAID writes special code onto drives.  Thus moving disks between RAID systems isn't easy.  ZFS doesn't care because ZFS on system A will easily read ZFS from system B.  That includes if the OS was different.  So data drive on BSD will be easily read on Linux.

4) You may set special flags in ZFS (including making your own).  RAID can't do this.

5) ZFS is script friendly.  In fact it is ultra script friendly.  RAID can't do this.

6) ZFS checksums everything including checksumming the checksumming.  It stores multiple copies of this special file so even if bit rot happens in the checksum it can automatically fix itself.  RAID doesn't do this.

7) ZFS can automatically store multiple copies in a filesystem.  So if you setup a filesystem to store multiple copies then any file saved will get two copies automatically.  These copies will be placed as far on the physical media as possible to avoid as much as possible physical damage.  RAID can't do this.

8) ZFS will autoexpand.  So if you setup what is a mirrored pair of two 1 TBs all you have to do is simply upgrade the drives.  Once both are say 4 TB the system will automatically without any human interaction upgrade the size of the filesystem.  RAID can't do this.

9) You may easily set quotas.  RAID can't do this.

10) ZFS can share hot spares between pools easily.  So let's say you had three ZFS pools and one hot spare.  A drive dies in pool B.  That pool will automatically copy the data from the good drive to the spare.  The pool goes into degraded status.  Now you replace the failed drive.  ZFS then copies the data back from the hot spare and restores the pool to healthy.  It then releases the spare to be used for any ZFS pool (that you said could use it as you retain control).

11) ZFS is filesystem, file, and hardware aware.  RAID can't do that.

12) The list goes on however I am not trying to write out the complete guide.

RAID is weak and far inferior technology.  RAID is old thinking.
Jeremy (Mr. Server)

* Desktop: Ubuntu MATE
* Windows are for your walls, Apple is for your health, Linux is for your computer
Reply
#10
(10-05-2018, 08:50 PM)cleverwise Wrote: If you want to use Linux then the easier method to get started is use Ubuntu.  Install it using the standard EXT4 and then add data drives that are formatted with ZFS to store your data.  Why not run the whole system as ZFS?

Well it isn't hard to get ZFS on root for Linux but it is more effort.  You mainly need ZFS on the storage drives as the OS is easily reloadable.  Still running ZFS on the whole stack is nice it just takes more work.  So it is up to you.

One way to learn ZFS is simply install Ubuntu in VirtualBox, VMWare, etc.  Then add additional drives.  They can be small like 500MBs if you just want to learn.

I will do exactly this and probably post about it as soon as I gain the first experiences with ZFS. 
Thank you for making me interested.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)