Getting the original Picture back out of a Visio shape

Started by Visisthebest, August 23, 2023, 05:31:33 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Yacine

Some silly thoughts.

1) In the XML code from Nikolay, all the images are exported. This is probably not wanted, as you won't know which one is which. The identification of the right shape will require a more precise query of the XML.
And what if the image is embedded in a group. What is the relevant parent ID? Probably the one whom's parent is the page ... and this only if there is only one image in the group.

2) If working on the drawing, wouldn't it be blocked by the OS? Could require to create a copy of the file.

3) A stencil could catch the images. No conversion necessary.

By the way, the subject is a very general one. Last year I needed to extract pictures from a database to insert in word documents. I chose the clipboard. But it is risky, as Visisthebest mentioned already.
Yacine

Visisthebest

Wapperdude what I typically do is a Page.Import of a PNG, which becomes a shape. Then quite a bit of formatting on this new shape, adding a border, connection points etc, is applied to the shape. The user can resize the shape as well and apply their own additional formatting.

Apart from the (potential) loss you get from re-exporting a shape as a picture several times, I also have to remove the formatting to be able to do this which isn't ideal (I either have to make a copy of the shape and remove all formatting, or remove then reapply for export).

Visio 2021 Professional

Visisthebest

Nikolay this works perfectly on an open Visio file, please find the VB.NET version of the code below.

I notice Visio has renamed the pictures to image1, image2, image3 etc.

I don't know if it is possible to force an internal filename, maybe Visio stores the renamed filename somewhere in the shape but I cannot easily see it in the ShapeSheet under these sections:
Foreign Image Info
Image Properties

If I can relate a shape to the specific image filename used internally, then the whole puzzle is solved!  :D :D :D


Imports System
Imports System.IO
Imports System.IO.Packaging

Module VSDXLibrary

    Friend Function ReadAllBytesFromStream(ByVal stream As Stream) As Byte()
        Using ms As New MemoryStream()
            stream.CopyTo(ms)
            Return ms.ToArray()
        End Using
    End Function

    Friend Sub ExtractMediaFromVisio(ByVal vsdxFilePath As String, ByVal destinationFolder As String)

        ' open file that is already opened by Visio
        Dim stream = File.Open(vsdxFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
        Using visioPackage As Package = Package.Open(stream)

            For Each part As PackagePart In visioPackage.GetParts()
                If part.ContentType.StartsWith("image/", StringComparison.OrdinalIgnoreCase) Then
                    Dim uri = part.Uri
                    Dim fileName = Path.GetFileName(uri.ToString())
                    Dim fileBytes = ReadAllBytesFromStream(part.GetStream())

                    ' Save the image to the destination directory
                    File.WriteAllBytes(Path.Combine(destinationFolder, fileName), fileBytes)
                End If
            Next
        End Using
    End Sub

End Module
Visio 2021 Professional

Visisthebest

Nikolay are you using ChatGPT directly (the free or paid version)?

I am using Github Copilot X Chat (based on GPT-4 says Github) but I got much less useful answers than you did.

Tried different prompts but could be I used the wrong prompts for this.
Visio 2021 Professional

Nikolay

#19
The files are actually renamed internally, you can see that in the file, if you rename vsdx to zip an open it manually ("media" folder)
But binding to shapes is also possible,, since Visio itself does itt somehow :)
At least finding shape Id and page Id for each.

I am on the paid subscription ($20 per month), GPT4 (https://chat.openai.com/?model=gpt-4)

It turned out that the most useful feature of large language model is text writing, surprise-surprise :D
Now I'm updating my site using its help - it can do summaries, descriptions, rewrites, cleanups, translations (much better than google or myself) etc.

So using it primarily not for programming, considered it may easily make mistakes in this area (especially in the area that is not widely spread, like Visio)
But I find it really useful. For example, it has built almost 90% of this web app: https://webtools.unmanagedvisio.com/pdftip

It is like you get an assistant for $20 instead of some $2K salary you would pay to somebody on fiverr, for example, for the same.
And it does not complain about low income or go on strike (yet?) :)

Visisthebest

Thank you Nikolay will cancel Github Copilot and switch to paid GPT-4 to try it for a month. I must say generally the output just has not been very useful from Copilot X Chat, could be vb.net but also asking for C# code for Visio often causes nonsensical code with non-existing properties and methods on Visio objects to appear in the suggested code. Amusing but not so useful.
Visio 2021 Professional

Visisthebest

Nikolay the challenge is after the Page.Import to relate the created shape to the imageX.png (where X is the number Visio assigns to it).

Any ideas how I can find out what Visio renamed the file to, because then I can store the filename in a User cell on the shape so I can relate them.
Visio 2021 Professional

Nikolay

#22
For writing code, it is not that good yet, IMHO.
I.e. it may help with some areas where you have just "surface" knowledge to get started, but with questions like these (how to find shape/page for a file embedded in a vsdx file),
I am pretty sure it will not give a decent answer, so we would need to figure it out ourselves.
Maybe in a few years, when it matures and grows up.

I think the problem is 100% solved if we can identify shape to which the image is bound. i.e. it's page id and the shape id?
Visio does not save the original file name anywhere in the vsdx.

Visisthebest

I got this output:

To read the binary data of an image from the `ForeignData` section of a shape's ShapeSheet in Visio, you can use the `Shape.CellsU` property to access the `ForeignData` cell. The `ForeignData` cell holds the binary data of the image as a string, which you can then convert to a byte array.

Here's a step-by-step guide:

1. **Access the shape**: This can be the shape returned by the `Page.Import` method or any other image shape in your diagram.

2. **Check if the `ForeignData` cell exists**: Before attempting to access the `ForeignData` cell, ensure that it exists in the shape.

3. **Retrieve the data from the `ForeignData` cell**: Access the binary data as a string and then convert this string to a byte array.

Here's a sample code:

```vbnet
Dim imageShape As Visio.Shape
' Assuming imageShape is the shape you're working with

' Check if the ForeignData cell exists
If imageShape.CellsU.HasRow(Visio.VisSectionIndices.visSectionObject, Visio.VisRowIndices.visRowForeign, Visio.VisCellIndices.visFrgnData) Then
    Dim cell As Visio.Cell = imageShape.CellsU("ForeignData")
    Dim dataString As String = cell.FormulaU

    ' Convert the string data to byte array
    Dim imageData As Byte() = System.Convert.FromBase64String(dataString)
   
    ' Now, you can use the imageData byte array as required, e.g., write it to a file or process it
Else
    ' The shape doesn't have a ForeignData cell or it's not an image shape
End If
```


Note: The data in the `ForeignData` cell is typically stored as a Base64 encoded string. Hence, we use `Convert.FromBase64String` to convert it back to its byte array form. You can then use this byte array to save the image, display it, or process it further as needed.
Visio 2021 Professional

Visisthebest

Does anyone know about the ForeignData section of the shapesheet? :D

I did find this about the Visio file format:
https://learn.microsoft.com/en-us/office/client-developer/visio/foreigndata-element-shapesheet_type-complextypevisio-xml

Nikolay I did some tests it is better for sure compared to Copilot X Chat, but yes not all outputs are correct/useful.
Visio 2021 Professional

Nikolay

#25
It is just lying, there is "ForeignData" shape sheet cell, of course. If it existed, that would be a practical solution.
But it's not entirely wrong. If you open the VSDX file, there is "ForeignData" cell in the XML (see below)
And this can be used to figure out pageid and shapeid. Just not in the way it suggests.

Below, the <Rel r:id /> in the <FoeignData /> refers exactly to the media image.

Visisthebest

Visio 2021 Professional

Visisthebest

Nikolay the best I could come up with is I read the PNG or JPG, I add something to the metadata in the files after I read them in to a filestream, then import the file in to a Visio page.

I thought about hashes as a checksum but Visio may change the file so the checksum then fails as well.
Visio 2021 Professional

Visisthebest

I can get this info from the XML fortunately, each time you add an image another page reference to an image is added like this:


<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.jpeg"/>


The <ForeignData> element in a shape refers to the reference stored in the page, like this:


<ForeignData ForeignType='Bitmap' CompressionType='JPEG' CompressionLevel='0.05'><Rel r:id='rId1'/></ForeignData>


So it is a bit indirect, via the rId1, rId2 etc references in the shape you can find the reference in the page that has the filename in the media ZIP directory.

Problem solved thank you all for your kind help!
Visio 2021 Professional

Nikolay

#29
Yes, exactly. Working for me (now with pageid and shapeid)


using System;
using System.IO;
using System.IO.Packaging;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;

namespace ConsoleApp1
{
    class Program
    {
        public static byte[] ReadAllBytesFromStream(Stream stream)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                stream.CopyTo(ms);
                return ms.ToArray();
            }
        }

        public static XDocument GetXMLFromPart(PackagePart packagePart)
        {
            var partStream = packagePart.GetStream();
            var partXml = XDocument.Load(partStream);
            return partXml;
        }

        public static void ExtractMediaFromVisio(string vsdxFilePath, string destinationFolder)
        {
            XNamespace nsRel = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

            var ns = new XmlNamespaceManager(new NameTable());
            ns.AddNamespace("v", "http://schemas.microsoft.com/office/visio/2012/main");
            ns.AddNamespace("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");

            var stream = File.Open(vsdxFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
            using (Package visioPackage = Package.Open(stream))
            {
                var documentRel = visioPackage.GetRelationshipsByType("http://schemas.microsoft.com/visio/2010/relationships/document").First();
                Uri docUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), documentRel.TargetUri);
                var documentPart = visioPackage.GetPart(docUri);

                var pagesRel = documentPart.GetRelationshipsByType("http://schemas.microsoft.com/visio/2010/relationships/pages").First();
                Uri pagesUri = PackUriHelper.ResolvePartUri(documentPart.Uri, pagesRel.TargetUri);
                var pagesPart = visioPackage.GetPart(pagesUri);

                var xmlPages = GetXMLFromPart(pagesPart);
                var pageRels = pagesPart.GetRelationshipsByType("http://schemas.microsoft.com/visio/2010/relationships/page").ToList();
                foreach (var pageRel in pageRels)
                {
                    Uri pageUri = PackUriHelper.ResolvePartUri(pagesPart.Uri, pageRel.TargetUri);
                    var pagePart = visioPackage.GetPart(pageUri);

                    var imageRels = pagePart.GetRelationshipsByType("http://schemas.openxmlformats.org/officeDocument/2006/relationships/image").ToList();

                    var xmlPage = GetXMLFromPart(pagePart);

                    var xmlShapes = xmlPage.XPathSelectElements("/v:PageContents//v:Shape[@Type='Foreign']", ns).ToList();
                    foreach (var xmlShape in xmlShapes)
                    {
                        var pageId = xmlPages.XPathSelectElement($"/v:Pages/v:Page[v:Rel/@r:id='{pageRel.Id}']", ns).Attribute("ID").Value;
                        var shapeId = xmlShape.Attribute("ID").Value;

                        var imageRelId = xmlShape.XPathSelectElement("./v:ForeignData/v:Rel", ns).Attribute(nsRel + "id").Value;

                        var imageRel = imageRels.First(r => r.Id == imageRelId);
                        var imagePart = visioPackage.GetPart(PackUriHelper.ResolvePartUri(pagePart.Uri, imageRel.TargetUri));
                        var uri = imagePart.Uri;

                        var fileBytes = ReadAllBytesFromStream(imagePart.GetStream());
                        var imageName = Path.GetFileName(uri.ToString());
                        var fileName = $"pageid_{pageId}_shapeid_{shapeId}_{imageName}";

                        // Save the image to the destination directory
                        File.WriteAllBytes(Path.Combine(destinationFolder, fileName), fileBytes);
                    }
                }
            }
        }

        static void Main(string[] args)
        {

            ExtractMediaFromVisio(
                @"C:\Users\nbelyh\Documents\111.vsdx",
                @"C:\Users\nbelyh\Documents");
        }
    }
}