[UPDATE START]
We found a nasty performance bug in the code below. The DeCompress
method copies a string for each turn in the loop. That is a classic problem that creates a new copy of the string for each row. That became a major problem for a 3.8 MB string…
I have now updated the code to use the System.Text.StringBuilder
object instead. That took down the speed to about a tenth. Sorry that I didn’t catch that…
[UPDATE STOP]
We had a quite special need the other day; we wanted to compress a part of our request, namely an XML string that was sent to us.
Most of the examples I found on the net showed how to compress the content of a file. But here is the code that compresses a string. The code uses ICSharpCode.SharpZipLib. Here you go:
/// <summary>
/// Compress strings with ICSharpCode.SharpZipLib
/// </summary>
public class StringZipper
{
public static string Compress(string uncompressedString)
{
byte[] bytData = System.Text.Encoding.UTF8.GetBytes(uncompressedString);
using (MemoryStream ms = new MemoryStream())
{
using (Stream s = new DeflaterOutputStream(ms))
{
s.Write(bytData, 0, bytData.Length);
}
byte[] compressedData = ms.ToArray();
return System.Convert.ToBase64String(compressedData, 0, compressedData.Length);
}
}
public static string DeCompress(string compressedString)
{
StringBuilder uncompressedString = new StringBuilder();
byte[] bytInput = System.Convert.FromBase64String(compressedString);
byte[] writeData = new byte[4096];
using (Stream s2 = new InflaterInputStream(new MemoryStream(bytInput)))
{
int size;
while ((size = s2.Read(writeData, 0, writeData.Length)) > 0)
{
uncompressedString.Append(System.Text.Encoding.UTF8.GetString(writeData, 0, size));
}
}
return uncompressedString.ToString();
}
}
And here is how to call it:
// Compress the string
string orgString = "Hello compression with åäö.";
// orgString = File.ReadAllText(args[0]);
Stopwatch sw = new Stopwatch();
sw.Start();
string compressedString = StringZipper.Compress(orgString);
sw.Stop();
string compressTime = sw.ElapsedMilliseconds.ToString();
// Decompress the string
sw.Reset(); sw.Start();
string decompressedString = StringZipper.DeCompress(compressedString);
sw.Stop();
string decompressTime = sw.ElapsedMilliseconds.ToString();
if (orgString == decompressedString)
Console.WriteLine("SAME");
else
Console.WriteLine("DIFFER");
decimal ratio = (decimal)compressedString.Length / (decimal)orgString.Length;
Console.WriteLine($"Original string was: {orgString.Length} chars");
Console.WriteLine($"Compressed string was: {compressedString.Length} chars");
Console.WriteLine($"Decompressed string was: {decompressedString.Length} chars");
Console.WriteLine($"Compression ratio {ratio}");
Console.WriteLine($"The compression took {compressTime} ms, Decompression took {decompressTime} ms");
I should point out two things:
- For short strings, the compressed string might actually be longer than the original.
- It’s not very good to store the result of the compression as a string (as done by
System.Convert.ToBase64String
), but it’s quite nice to have for serialization.