C# Addin asynchronous access to Visio Document

Started by Memnok, February 28, 2023, 05:31:54 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Memnok

Hello, I've run into a bit of an issue that i don't quite understand.

So, I have a pretty big document with a lot of shapes. A frequent action is to iterate these shapes to look for shape data which takes a long time when accessing the "Prop." cells. So I tried an asynchronous approach with Tasks. But the issue I'm having is that only the last page in the document returns the proper data when accessing the cell values. The rest just return nothing. Is it not possible to access pages in parallell from the COM object? If anyone has tried anything similar or has any tips I'd appreciate it.
I have attached the code I wrote for iterating shapes.


private void FindShapeData(string identifier)
{
        // Create an array of tasks that will look through each page in the document
        Task<List<int>>[] tasks = new Task<List<int>>[ActiveDocument.Pages.Count];
        for (int i = 0; i < tasks.Count(); i++)
        {
            // Start each task and tell it to find shapes with shapedata that match the identifier
            tasks[i] = Task.Run(() => GetShapes(ActiveDocument.Pages[i].Shapes, identifier));
        }

        // Wait for all tasks to complete
        Task.WaitAll(tasks);
       
        // Check result from each task, each task should return a list of int that are the shape IDs that matched the identifier
        for (int i = 0; i < tasks.Count(); i++)
        {
            List<int> ints = tasks[i].Result;
            foreach (int shapeID in ints)
            {
               Shape foundShape = ActiveDocument.Pages[i].Shapes.ItemFromID[shapeID]);
            }
        }
}

// Check every shape in a page for shape data
private static List<int> GetShapes(Shapes shapes, string identifier)
{
    List<int> shapeIDs = new List<int>();

    foreach (Shape shape in shapes)
    {
        // Check each shape to see if it has the Prop.REF cell and compare the value in the cell to the identifier
        string value = "";
        if (ShapeWrapper.GetProperty(shape, ShapeProperties.REF, ref value) && value == identifier)
        {
            shapeIDs.Add(shape.ID);
        }
    }
    return shapeIDs;
}



Paul Herber

Your first loop is accessing pages but i = 0 on the first iteration. Page indexes start from 1.
Try pages[i+1]
and in the checking as well ...
Electronic and Electrical engineering, business and software stencils for Visio -

https://www.paulherber.co.uk/

Nikolay

#2
Visio (like all other Office applications) is a single-threaded application (COM runs in STA apartment, to be more precise), so your parallel calls will be automatically serialized when accessing Visio.
Means, all API calls will be executed by Visio sequentially, not in parallel, regardless of what kind of parallelism you are using in your C# code. Visio puts all API calls in a queue and executes them one-by-one.
That, in turn, means, there is no point in doing it in parallel at all, it will be only slower.

I would recommend you cancel this approach altogether.
What you could do instead, is to use a proper means to get shape data.
For example, you can get all your data properties from all shapes in a single API call using GetFormulas() method.

That aside, regarding why your code is working in an odd way, probably this is what happens:
The first for loop completes before anything inside Task.Run() even starts executing.
The pages in Visio are counted from 1, not from 0, after the loop finishes, your index (i) = Count, and when code inside Run() starts executing, (i) is always equal to Count (the loop has already finished)
This "double error" probably results in what you see - you get the shapes from the last page.

Anyway, even if you fix this, the result will be slower than the straightforward approach.
And the straightforward approach will be slower than equivalent VBA code.

Memnok

Quote from: Paul Herber on February 28, 2023, 06:55:06 PM
Your first loop is accessing pages but i = 0 on the first iteration. Page indexes start from 1.
Try pages[i+1]
and in the checking as well ...

Hi Paul, I tried this but it gave me an index out of range exception, seems like in C# the page index starts from 0. Because when I use a standard for loop to iterate the pages there's no problem starting from 0. I have attached a picture from the debugger that you should be able to view.

Memnok

Quote from: Nikolay on February 28, 2023, 06:55:32 PM
Visio (like all other Office applications) is a single-threaded application (COM runs in STA apartment, to be more precise), so your parallel calls will be automatically serialized when accessing Visio.
Means, all API calls will be executed by Visio sequentially, not in parallel, regardless of what kind of parallelism you are using in your C# code. Visio puts all API calls in a queue and executes them one-by-one.
That, in turn, means, there is no point in doing it in parallel at all, it will be only slower.

I would recommend you cancel this approach altogether.
What you could do instead, is to use a proper means to get shape data.
For example, you can get all your data properties from all shapes in a single API call using GetFormulas() method.

That aside, regarding why your code is working in an odd way, probably this is what happens:
The first for loop completes before anything inside Task.Run() even starts executing.
The pages in Visio are counted from 1, not from 0, after the loop finishes, your index (i) = Count, and when code inside Run() starts executing, (i) is always equal to Count (the loop has already finished)
This "double error" probably results in what you see - you get the shapes from the last page.

Anyway, even if you fix this, the result will be slower than the straightforward approach.
And the straightforward approach will be slower than equivalent VBA code.

Hi Nikolay, thank you very much for the explanation, I had a feeling that this could be the issue.
Though I'm not sure if you've interpreted my code correctly or if I'm still not understanding the odd behavior. But the for loop is running before the task, and only the list of shapes on each page is being sent to the task. So the tasks themselves do not care about the (i) iterator, each tasks just has a Shapes object to iterate. And if they were all running on (i = 0), then I should only be getting values from the first page and not the last one. As you can see in my reply to Paul, in C# the page index seems to start from 0. But I assume this is due to some undefined behavior when trying to parallelize the COM calls. At least that's how it works in my brain, and I have been fooled by my brain before.
I have in any case, scrapped this idea and I will be looking into the GetFormulas() method you suggested to speed up the process.

Nikolay

#5
Quote from: Memnok on March 01, 2023, 06:59:37 AM
Hi Nikolay, thank you very much for the explanation, I had a feeling that this could be the issue.
Though I'm not sure if you've interpreted my code correctly or if I'm still not understanding the odd behavior. But the for loop is running before the task, and only the list of shapes on each page is being sent to the task. So the tasks themselves do not care about the (i) iterator, each tasks just has a Shapes object to iterate. And if they were all running on (i = 0), then I should only be getting values from the first page and not the last one. As you can see in my reply to Paul, in C# the page index seems to start from 0. But I assume this is due to some undefined behavior when trying to parallelize the COM calls. At least that's how it works in my brain, and I have been fooled by my brain before.
I have in any case, scrapped this idea and I will be looking into the GetFormulas() method you suggested to speed up the process.

Hmm, in your first post you said it was the last page?
Anyway, to understand what I mean, try this and probably be surprised  :D

https://dotnetfiddle.net/Qlg1Jy

            var tasks = new Task[10];
            for (int i = 0; i < 10; i++)
            {
                tasks[i] = Task.Run(() => Console.WriteLine(i));
            }
            Task.WaitAll(tasks);


A better version of the above code:

            for (int i = 0; i < 10; i++)
            {
                var index = i;
                tasks[i] = Task.Run(() => Console.WriteLine(index));
            }
            Task.WaitAll(tasks);

I don't think the problem has anything to do with COM.

Paul Herber

Quote from: Memnok on March 01, 2023, 06:39:40 AM
Quote from: Paul Herber on February 28, 2023, 06:55:06 PM
Your first loop is accessing pages but i = 0 on the first iteration. Page indexes start from 1.
Try pages[i+1]
and in the checking as well ...

Hi Paul, I tried this but it gave me an index out of range exception, seems like in C# the page index starts from 0. Because when I use a standard for loop to iterate the pages there's no problem starting from 0. I have attached a picture from the debugger that you should be able to view.

Your loop counter is fine, needs to start from 0, but you need to access the pages using:
pages[i+1]
Electronic and Electrical engineering, business and software stencils for Visio -

https://www.paulherber.co.uk/

Memnok

Quote from: Nikolay on March 01, 2023, 07:31:58 AM
Quote from: Memnok on March 01, 2023, 06:59:37 AM
...

Hmm, in your first post you said it was the last page?
Anyway, to understand what I mean, try this and probably be surprised  :D

https://dotnetfiddle.net/Qlg1Jy

            var tasks = new Task[10];
            for (int i = 0; i < 10; i++)
            {
                tasks[i] = Task.Run(() => Console.WriteLine(i));
            }
            Task.WaitAll(tasks);


A better version of the above code:

            for (int i = 0; i < 10; i++)
            {
                var index = i;
                tasks[i] = Task.Run(() => Console.WriteLine(index));
            }
            Task.WaitAll(tasks);

I don't think the problem has anything to do with COM.

Ah, I see now. I understood it as the code sending a copy of the (i) variable when Task.Run was being called. But I guess it's a reference and that's why it won't line up properly.

Memnok

Quote from: Paul Herber on March 01, 2023, 08:21:11 AM
Quote from: Memnok on March 01, 2023, 06:39:40 AM
Quote from: Paul Herber on February 28, 2023, 06:55:06 PM
Your first loop is accessing pages but i = 0 on the first iteration. Page indexes start from 1.
Try pages[i+1]
and in the checking as well ...

Hi Paul, I tried this but it gave me an index out of range exception, seems like in C# the page index starts from 0. Because when I use a standard for loop to iterate the pages there's no problem starting from 0. I have attached a picture from the debugger that you should be able to view.

Your loop counter is fine, needs to start from 0, but you need to access the pages using:
pages[i+1]

Okay, I guess I've always used foreach to iterate pages before. And the weird interaction from the task iterator mean I didn't get an exception because it will already pass 0 before sending anything. I got the exception when I tried a fully synchronous iteration. Thank you!