Studies in Computer Sciences and Practices in Software Engineering

Bitmap.FromFile Locks the Image File

Creating a bitmap object direct from an image file, either by using the static FromFile method, or by passing the name of the image file to the constructor, keeps the file locked until the Bitmap object is disposed. Similarly, bitmap created directly from a stream using FromStream static method or passing the stream to Bitmap constructor requires the stream to kept open for the lifetime of the bitmap object. Many people talked about it in forums or blogged about it. Few offer good insight as to the reason behind such behavior.

It is actually not difficult to make a guess of the reason for the file locking. That is, the entire content of the image file is not loaded into the computers main memory when a bitmap map object is created from the file. Instead, the object encapsulate the image file and fetches the image data on demand. The advantage of doing this is clear cut for Windows system. In a 32 bit Windows, the main memory is quickly become the most scarce resource when TB of hard disk and GB video RAM are becoming main stream. It makes sense for a program to play only the role of transferring image between the hard drive and the video RAM. This is exactly what an image object do when it is created directly from an image file.

To convince myself that the above postulation is indeed true, I did a test as the following:

I took a screen shot of the entire 1920x1200 display and saved it in a bitmap format. This image is to be used as the test image. The color depth of the display is 24 bit. Without compression, it creates an image file of 6,912,054 byte in size. When this image is fully loaded in the main memory in a program, we expect the program will consume a similar amount of main memory.

Since the .NET Bitmap class is essentially a wrapper of the corresponding GDI+ class, we expect the behavior of file locking is inherited from GDI+, although it was not so spelled out explicitly in GDI+ documentation. So instead of using the managed environment, I opt to do the testing in the native C++ language, creating a simple windows application with a Windows process looks like the following:


LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
    int wmId, wmEvent;
    PAINTSTRUCT ps;
    HDC hdc;
    static Bitmap* p = 0;
    static Bitmap* q=0, *r=0, *s=0;
    Bitmap* tmp;

    switch (message)
    {
    case WM_COMMAND:
        wmId    = LOWORD(wParam);
        wmEvent = HIWORD(wParam);
        // Parse the menu selections:
        switch (wmId)
        {
        case ID_FILE_OPEN:
            {
                p = new Bitmap(L"C:\\Users\\x\\Pictures\\testbitmapfromfile.bmp");
                RECT rect;
                GetClientRect(hWnd, &rect);
                InvalidateRect(hWnd, &rect, FALSE);
            }
            break;
        case ID_ACTION_DUPLICATE:
            if (p!=0) {
                int w = p->GetWidth();
                int h = p->GetHeight();
                PixelFormat f = p->GetPixelFormat();
                tmp = new Bitmap(p->GetWidth(), p->GetHeight(), p->GetPixelFormat());
                if (q!=0)
                    if (r!=0)
                        if (s!=0)
                            delete tmp;
                        else
                            s = tmp;
                    else
                        r = tmp;
                else
                    q = tmp;
            }
            break;
        case IDM_ABOUT:
            DialogBox(hInst, MAKEINTRESOURCE(IDD_ABOUTBOX), hWnd, About);
            break;
        case IDM_EXIT:
            DestroyWindow(hWnd);
            break;
        default:
            return DefWindowProc(hWnd, message, wParam, lParam);
        }
        break;
    case WM_PAINT:
        hdc = BeginPaint(hWnd, &ps);
        if (p != 0) {
            RECT rect;
            GetClientRect(hWnd, &rect);
            Graphics g(hdc);
            g.DrawImage(p, 0, 0, rect.right-rect.left, rect.bottom-rect.left);
        }
        EndPaint(hWnd, &ps);
        break;
    case WM_DESTROY:
		delete p;
		delete q;
		delete r;
		delete s;
        PostQuitMessage(0);
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

By creating the application in the native environment, we reduce unnecessary overhead that might interfere with the interpretation of the experiment data. The application has a "File->Open" menu item, which, when selected, causes the application to create a GDI+ Bitmap object from the test image file then display the image on the application's client area. The application also has an "Action->Duplicate" item that will create, in the program's heap memory, up to three GDI+ Bitmap objects of the same dimension as the testing image. By monitoring the memory consumption of the program as these actions are taken, we can have better understanding the storage management of the image data. The results are:

 Application State       Memory usage (k)
------------------------------------------
 1. at program start                928
 2. open and display image file   1,148
 3. duplicate once                7,920
 4. duplicate twice              14,692
 5. duplicate three times        21,460
 6. duplicate four times         21,468
------------------------------------------
From the above data, we can clearly see that the opening and displaying the image file in a Bitmap object consumes memory far less than the size of the image's pixel data. On the other hand, if images of comparable size is created in the heap memory, the program allocates as much memory as would be required to store the image's pixel data. The above numbers conclusively demonstrated that when a Bitmap is created "FromFile" it does not cause the pixel data to be loaded into the main memory. That is why the file is locked, until the image is release from the use by the program.

16:22:25 on 12/26/08 by x mar - Programming - comments

Emsembly Load Tracking

set HKLM.SOFTWARE.Microsoft.Fusion!EnableLog(DWORD) to 1

09:19:31 on 12/19/08 by x mar - Programming - comments

Integer Division is Very Expensive

I always know that integer division takes more clock cycles to perform than integer multiplication and pay attention to avoid division whenever possible.

But I would be lying if I say I was not at least somewhat surprised to read in Knuth's MMIX fascile that assign a weight 6 times as much as that of multiplication. So I decided to do a test. The test program contains a nested loop around a single statement, as show below:

int _tmain(int argc, _TCHAR* argv[])
{
    time_t t0, t1;
    long s = 0;

    time(&t0);
    for (long i=0; i<2000; i+=2)
        for (long j=0; j<2000; j+=2)
            for (long k=0; k<2000; k+=2)
                // to compare the cost of * and /, change the following
                // statement to a form like: [ s = k*j^_^_*2^_ ]
                // substituting with either * or / in the place of ^
                // notice that k and j are even numbers so no
                // rounding will take place if _ is 2.
                s = k*j*2*2*2*2;

	time(&t1);

	double time = difftime(t1, t0);
	std::cout << time << " " << s;
	_sleep(10000);
	return 0;
In the experiment, the multiplication in the statement s = k*j... is selectively replaced by division and the execution time of the loop is recorded. A multiple number of trails are conduction to average out the fluctations. The result is tabulated in the following table:
# of | Execution Time
Divs | of Trails (sec)          | Ave (sec)
-----------------------------------------------
  0  |  9,  9,  9,  9,  9,  9   | 8.7(*)
  1  | 13, 12, 12, 13, 13, 13   | 12.6
  2  | 17, 16, 17, 17, 17, 17   | 16.8
  3  | 20, 21, 21, 21           | 20.8

I've thrown in a little bit of speculation in the average value in the first row, based on the overall trend. (Readers are entitled to their own opinion as of the validity of the speculation and draw own conclusion based on the raw data.) It can be seen that each substitution of multiplication by division increases the execution time by four seconds. However, this tells only the difference in the execution time. In order to find the ratio, the execution time of multiplication (or division) must be obtained.

To obtain the absolution execution time of mulitplication, the statement s = k*j ... was replaced by:

    s = k*j*2*2*2*2*2*2*2*2
and the execution of the loop of five trails are 15, 16, 16, 16, 15 seconds, respectively, which average to 15.6 seconds. Compare this value to the average execution time of the first raw in the table presented earlier, we conclude that the execution time for the four extra multiplication is 7 seconds, or 1.75 second per multiplication. Given that the difference in the execution between multiplication and division is 4 seconds, we conclude that the cost of division is 5.75 seconds. Or the ratio between division and multiplication is 5.75/1.75 = 3.3.

We can see that in the 32 bit Intel Pentium processor, division by 2 takes more than 3 times clock cycle than multiplication by 2. But the disparity is still much smaller than that of MMIX processor. We know that each division operation in MMIX computes the quadrant as well as the reminder. I'm not aware if that's the case in the Intel Pentium.

Notice that the use of the number 2 as the multiplier or divisor may not be the best choice. For one thing, multiplying or dividing by two for integers may be performed faster using the shift operator. Also, we've been careful to make the formular so no rounding occurs in each division.

It would be interesting to see if things will be different for other divisors. To make it easier to test different senarios, let's first modified the program a little bit:

#define N   2
#define INC 2
#define OP1 *
#define OP2 *
#define OP3 *
#define LONG 0
#define UPPER(LEAD) UPPER_HELPER(LEAD)##000
#define UPPER_HELPER(LEAD) LEAD

void Trial()
{
    static double total = 0.0;
    static int trials = 0;
    time_t t0, t1;
    long s = 0;
    time(&t0);

    for (long i=0; i< UPPER(INC); i+=INC)
        for (long j=0; j< UPPER(INC); j+=INC)
            for (long k=0; k< UPPER(INC); k+=INC)
#if (LONG)
                s = k*j OP1 N OP2 N * N OP3 N;
#else
                s = k*j;
#endif // (LONG)

    time(&t1);
    double time = difftime(t1, t0);
    total += time;
    std::cout << time << " " << s << " "<< total/(++trials) << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
    while (true) {
        Trial();
    }
    return 0;
}
In this way we can test different senarios by changing the macro definitions.

My first test is to set N=3 and INC=3. There's still no rounding, but the possibility of shift operation is out of the picture. The results are as the following (LONG=1):

# of | Execution Time
Divs | of Trails (sec)          | Ave (sec)
-----------------------------------------------
  0  | 19, 18, 18, 18, 18, 18   | 18.2
  1  | 43, 43, 43, 43, 43, 43   | 43.
  2  | 64, 64, 64, 64           | 64.
  3  | 87, 88, 87               | 87.3
And the execution time of the loop when LONG is 0, is 2.85, averaging over 20 trials.

We noticed right way that the execution time is much longer in the case of 3 compare to that of 2. One possibility is that, as we mentioned earlier, division or multiplication might have been substitute by shift operations underneath, although there seemed to have no reason for the shift right (divide) to be much slower than the shift left (multiply). We'll try to address this later but for now, let find the cost differential between multiplication and division by constant 3 and compare that to the assumption of professor Knuth.

Comparing the result of the first row in the above table when 4 multiplications are performed per iteration, to the execution time when LONG is 0, when no multiplication was performed at all inside the loop, we get the time per multiplication to be (18.2-2.85)/4=3.84 sec. To find the time difference between multiplication and division we use Microsoft Excel to compute the liner regression, with x-axis being the number of divisions and the y-axis being the execution time, to get y=22.74x+18.94. From these numbers we calculate that the time required by division vs. that of multiplication by 3 is (22.74+3.84)/3.84 = 6.9 which is a lot closer to the 6x value that Professor Knuth give to his MMIX processor.

I also looked at a few other things that might effect the computation circle, including the occurrance of rounding and/or overflow, as well as a few other operant such as 4 and 5. The test shows that rounding and overflow has no impact to the computation time. On the other hand, while the cost of using 5 as the second operant is identical to the cost of using 3, the cost of using 4 is comparable to that of using 2.

Now return to the issue of the number 2 in the operant. We observed that both operations are much faster and when the operant is 2 and between them, mulitplying by two is faster than dividing by two. We already mentioned that multiplying or dividing by two can be performed using shift operations. It can happen in the compiler or in the processor, although we turned off compiler optimization in our tests. So I tested the running time of the loop substiting the arithmetic operators by shift operators. The results are, to summarize:

  • <<1 is 12% faster than multiply by 2.
  • <<1 and >>1 requires same amount of time.
  • the amount of shift does not affect the computation time.
From the above, we speculate that it's possible that multiplication by a number that a power of 2 is done by shift left operation. A small amount of performance hit might be a result of finding the amount of shift. On the other hand, division by 2 might not be done by shifting operator in the Intel Pentium processor.

The above tests are performed in a computer with a 2.40GHz Pentium(R) 4 CPU, with Hyper-threading turned on. The results should by no means be generalize to other processors. Especially, notice that MMIX is a RISC processor while x86 instruction set is traditionally CISC by nature. The test program was compiled using Microsoft Visual C++ 2005 professional edition. Although the compiler optimization was turn off, I do not have documented evidence that no under-the-hood modification of operator by the compiler happened at all without inspecting the binary file.

A great source of to get a comprehensive picture of operation costs in Intel and AMD processors and ways of performance tuning can be found in Agner Fog's website.

10:41:42 on 12/10/08 by x mar - Programming - 2 comments

Animating the Popping up of a Popup Window

In a medical device software I'm currently developing, I need a popup window animating like the windows poping up from the task bar. After some experiment, I got the following that works satisfactorily.

    Private Sub Form1_MouseClick(ByVal sender As System.Object, ByVal e As System.Windows.Forms.MouseEventArgs) Handles MyBase.MouseClick
        Dim screen = PointToScreen(e.Location)
        Dim dlg As Dialog1 = New Dialog1
        dlg.Opacity = 0.4
        Dim size As Size = dlg.Size
        Dim top As Integer
        Dim left As Integer
        dlg.Enabled = False
        For s As Single = 0.1 To 1.0 Step 0.1
            top = screen.y - size.Height * s
            left = screen.x + 10 * s
            If top < 0 Then
                top = 0
            End If
            'If left > Parent().Right - 10 Then
            '    left = Parent().Right - 10
            'End If
            dlg.Bounds = New Rectangle(left, top, s * size.Width, 20)
            If dlg.Enabled = False Then
                dlg.Enabled = True
                dlg.Show(Me)
            Else
                dlg.Refresh()
            End If
            System.Threading.Thread.Sleep(16)
        Next s
        top = screen.y - size.Height
        left = screen.x + 10
        If top < 0 Then
            top = 0
        End If
        'If left > Parent().Right - 10 Then
        '    left = Parent().Right - 10
        'End If
        dlg.Bounds = New Rectangle(left, top, size.Width, size.Height)
        dlg.Opacity = 1.0
        dlg.Refresh()
    End Sub

12:36:20 on 11/17/08 by x mar - Programming - comments

Creating AVI From Combining a Set of Bitmaps

class CaptureCine
{
public:
    CaptureCine() : ___ {}

    void StartCapture(LPTSTR lpFilename) {
        HRESULT hr = AVIFileOpen(&pAVIFile, lpFilename, OF_WRITE|OF_CREATE, NULL);

	BITMAPINFOHEADER bi;
	bi.biSize = sizeof(BITMAPINFOHEADER);
	AVISTREAMINFO si;
	memset((void*)&si, 0, sizeof(AVISTREAMINFO));
	si.fccType = streamtypeVIDEO;
	si.dwRate = 30000;
	si.dwScale = 1001;
	si.dwQuality = -1;
	SetRect(&si.rcFrame, 0, 0, (int)640, (int)480);
	hr = AVIFileCreateStream(pAVIFile, &pAVIStream, &si);
	hr = AVIStreamSetFormat(pAVIStream, 0, (LPVOID)&MyBitmapInfo.bmiHeader, sizeof(BITMAPINFOHEADER));
	lStreamSize = 0L;
    }

    void CloseAVI() {
	if (pAVIStream)
		AVIStreamRelease(pAVIStream);
	if (pAVIFile)
		AVIFileRelease(pAVIFile);
    }

    void AddFrame()
    {

    }

private:
    // variables for video capture
    PAVIFILE pAVIFile;
    PAVISTREAM pAVIStream;
    LONG lStreamSize;
}

09:16:44 on 06/29/08 by x mar - Programming - comments

Write Your Unit Test, the Best Reason of All

After an update of source code from the repository, I was disguise to see that my program developed all kind of program: crashing, producing incorrect result, etc, etc.

Evidently, some people changed the semantics but not the syntaxes some class's interface. All my program still compile in the automated build but since I didn't write the unit tests for some of significant functional components of the application, the logic that was messed up by the semantic change of the other classes didn't reveal in the automated build.

Have I had written unit tests for the functional components for my application, I would not be the person fixing my code to fit the other people's change.

09:36:50 on 06/03/08 by x mar - Programming - comments

Return a Sensible Object

I was writing a function to perform linear interpolation of a list of spectrum:

List LinearInterpolation(List source)

I was tempted to return a null if the source is empty. Then I realized returning an empty list is better. Not only the does not have to check for null return, it also is more elegently in agreement with the input. user

11:58:50 on 12/13/07 by x mar - Programming - comments

Which is More Frustrating, Unable to Build a Project, or a Project Keeps Building

Well, I added a parameter to a method of a class in a C++ project. I hit the build key, expecting several errors to pop up. I was mistified that the project built without a single error.

Shorting of rebooting the computer, I did everything people would do to detect a no existing error. But the project kept building without error. Will I did suspected for a second or two that maybe a parameter with a default value exists so the build was not broken by the addition of the new parameter. But I quickly brushed the idea behind, considering that the new paramter added was of type bool before a parameter of type std::ostream&.

Not until a long time of frustration later, did I realize there's no bug in the compiler. It is indeed a parameter with a default value that allows the method with an additional paramter of type bool to be compatible to all the existing calls to the method because, std::stream& can be casted into bool implicitly.

11:07:49 on 12/02/07 by x mar - Programming - comments

Delete Files in a CVS Repository

I have a CVS repository hosted in a Windows 2000 Professional system. I installed CVSNT in the system. To use SSH for cvs, I also installed cygwin. Things has appeared to be working smoothly for quite some time. Then yesterday, I imported some source file into the repository but changed my mind. I set out to delete the files in the server using in the Windows Explore. But I kept getting the message that the files are in use. I stopped the CVSNT service, but the error persisted. It took a lot of digging, before I realize that a copy of cvs.exe inside cygwin is been used when a cvs request comes in via SSH. Using netstat, I confirmed that there's no server listening at the cvs port. cvs via SSH works anyway.

I came to the conclusion that by using ssh, the cvs command was tunnelled and executed in the server. The results are then transported back to the client via the ssh connection. And to delete any files created in the process, it has to be done withing cygwin.

19:11:55 on 11/18/07 by x mar - Programming - comments

"Host" File

Entry in the "Windows/System32/Drivers/etc/hosts" file effects things like MySql database access. e.g. an entry
192.168.0.101 www.myweb.com
will cause any user 'joe'@'102.168.0.101' be denied access to the database as the client is considered to be accessing from 'www.myweb.com'.

19:59:03 on 10/24/07 by x mar - Programming - comments
<   January 2009   >
MonTueWedThuFriSatSun
   1234
567891011
12131415161718
19202122232425
262728293031 

My Links