News:

Happy New Year!

Main Menu

Visio JSON Exort

Started by Thomas Winkel, November 22, 2024, 03:17:29 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Nikolay

#15
> They do not use Visio

I see. But if they won't need to edit diagrams (as they don't use Visio), can't you do this JSON export for them (once, yourself)?

>You do not export user cells or shape properties, right?

Regarding properties, it is just probably bad example file (all shapes without properties).
Attached another (older) example has shape properties. I do not export cells, they are kind of useless for me (i.e. only "shape data" is exported)

> What does sid / cid mean?

"sid" is a target shape id.
"cid" is a connector shape id (i.e. the line). Added recently for highlighting them, here:
https://svgpublish-sb.unmanagedvisio.com/?path=/story/basic-prevnext--database-diagram

"Text" is a search index for each shape (used for shape search, includes indexed properties)
Actual shape text is "@" (shape.Characters.Text)

There is some other stuff exported as well, like minimal info about pages and layers (to be able to toggle them)

Short names just to make file smaller

Thomas Winkel

Quote from: wapperdude on November 23, 2024, 03:48:05 PMAre they driving gnd machines?
gnd machines?
If you mean CNC, then no.
One goal could be to analyse schematics like above to generate software that controls the IO.
The schematics holds information about IO mappings, configurations, signal names, scalings, etc. in shape properties.
The logic can be derived by following the connections.
But please don't get stuck on this example. My goal is to create a generic export that can be used for many possible further processing steps.


Quote from: Nikolay on November 23, 2024, 04:38:44 PMBut if they won't need to edit diagrams (as they don't use Visio), can't you do this JSON export for them (once, yourself)?
Yes, this would be the way to go.

Your export is very much in the direction I imagine. In addition, I need at least user defined cells, connection points (including D-Cell) as well as page and document properties.

Your SplitPagesService.cs is really impressive, but honestly for me it's hard to read and to understand :-[
I think a well structured JSON will be much easier to parse.

Nikolay

#17
That one parses VSDX directly (i.e. can be used to get an idea what it could take to parse VSDX file directly)
All that code does is basically just removing all pages from a file except one (passed as a number)

Using API that stuff would look something like this:
for (i = pages.Count; i > 0; --i) {
  if (i != pageToKeep) page.Delete();
}

The code is just using .NET SDK (System.IO.Packaging) to work with the VSDX package
https://learn.microsoft.com/en-us/dotnet/api/system.io.packaging

My point was that using Visio COM API to extract JSON could be 10x times easier than parsing VSDX directly

Thomas Winkel

#18
Quote from: Nikolay on November 26, 2024, 02:07:13 AMMy point was that using Visio COM API to extract JSON could be 10x times easier that parsing VSDX directly
That's exactly my point. 10 times easier using COM than parsing VSDX, 10 times easier parsing JSON than parsing VSDX.
I extract a JSON (using COM) and the customer doesn't have to bother with VSDX.
But if I do so, it should be generic enough that it can be used in many other scenarios.

I would open source it and provide a NuGet package.
Also I could provide a Python library to map this JSON object into a class for easier handling (-> Intellisense).
I'm not a Python expert, but I think this can be done with Pydantic.

wapperdude

Quotegnd machines?
If you mean CNC, then no.
Yes.  Missed the autocorrect incorrrection.

Other than using Visio as a source document, seems like there are many schematic capture programs out there that do this, front to back design to layout, with simulation, worst case analysis, and error support.  Visio doesn't have the capability to answer / support all of these facets of a project design and development.
Visio 2019 Pro

Nikolay

#20
@Thomas Winkel
I think the customer may be just under the impression that Visio's XML is somewhat like drawio's XML,
Or what we had before with .VDX (a plain XML format, that was discontinued)

Thomas Winkel

NuGet for export with .NET:
https://www.nuget.org/packages/Geradeaus.VisioJsonExport

Geradeaus.VisioJsonExport.ExportHandler exportHandler = new Geradeaus.VisioJsonExport.ExportHandler(Globals.ThisAddIn.Application.ActiveDocument);
exportHandler.Parse();
exportHandler.Export(@"C:\Temp\VisioExport.json");


PyPi for processing in Python:
https://pypi.org/project/visio-json-export

import visio_json_export

visio = visio_json_export.load_file(r'C:\Temp\VisioExport.json')

for page in visio.document.pages.values():
    for shape in page.shapes.values():
        for row_name, user in shape.user_rows.items():
            print(page.name + ' -> ' + shape.name + ' -> ' + row_name + ' = ' + user.value)
        for prop in shape.prop_rows.values():
            print(page.name + ' -> ' + shape.name + ' -> ' + prop.label + ' = ' + prop.value)

Next I integrate layers to support our variant management system.

Thomas Winkel

Update: Added Layers and Connects.

This also looks interesting:
https://pypi.org/project/vsdx/
Works directly on the VSDX file and can even manipulate it.
But I haven't tested it yet.

Yacine

Respekt!

Will have a try. Could be very useful.
Yacine

Nikolay

Was looking recently at the structure of the VSDX page (i.e. raw XML) because of that "cipher" thing (need to do it for some diagrams).

Looks like I was wrong about extracting JSON directly from XML, it IS manageable after all. Event the connections. Luckily, Visio XML has a "Connects" section, below (did not expect to see that, I thought it's only ShapeSheet formulas). That solves the connection thing in fact:

  <Connects>
    <Connect FromSheet='16' FromCell='EndX' FromPart='12' ToSheet='4' ToCell='Connections.X1' ToPart='100'/>
    <Connect FromSheet='16' FromCell='BeginX' FromPart='9' ToSheet='2' ToCell='Connections.X1' ToPart='100'/>
    <Connect FromSheet='15' FromCell='EndX' FromPart='12' ToSheet='11' ToCell='PinX' ToPart='3'/>
    <Connect FromSheet='3' FromCell='EndX' FromPart='12' ToSheet='2' ToCell='PinX' ToPart='3'/>
    <Connect FromSheet='3' FromCell='BeginX' FromPart='9' ToSheet='1' ToCell='PinX' ToPart='3'/>
    <Connect FromSheet='15' FromCell='BeginX' FromPart='9' ToSheet='4' ToCell='Connections.X2' ToPart='101'/>
  </Connects>

Thomas Winkel

This is good news, thanks for this info.
That motivated me to take a look at the XML, too:
* Only modified properties are stored there
  i.e. if you do not change FillForegndTrans, there is no FillForegndTrans
* Next to the formulas also the corresponding result is stored

Since I have no idea about ZIP / XML I asked ChatGPT:
Quoteimport zipfile
import xml.etree.ElementTree as ET


def liste_shapes_aus_vsdx(vsdx_datei):
    try:
        # ZIP-Datei öffnen (denn .vsdx-Dateien sind ZIP-Archive)
        with zipfile.ZipFile(vsdx_datei, 'r') as visio_zip:
            # Alle Dateien im ZIP-Archiv anzeigen
            dateien_im_archiv = visio_zip.namelist()

            # "pages/", dort befinden sich die Seitenbeschreibungen der Datei
            seiten_dateien = [f for f in dateien_im_archiv if f.startswith('visio/pages/') and f.endswith('.xml')]

            # Alle Seiten durchgehen
            for seiten_datei in seiten_dateien:
                print(f"Analysiere Seite-Datei: {seiten_datei}")
                # XML-Daten der Seite extrahieren
                with visio_zip.open(seiten_datei) as seite:
                    # XML parsen
                    baum = ET.parse(seite)
                    wurzel = baum.getroot()

                    # Alle Shapes extrahieren, Visio-Shapes sind als `<Shape>`-Elemente gespeichert
                    ns = {'': 'http://schemas.microsoft.com/office/visio/2012/main'}
                    shapes = wurzel.findall(".//{http://schemas.microsoft.com/office/visio/2012/main}Shape")

                    # Shapes anzeigen
                    for shape in shapes:
                        shape_id = shape.attrib.get('ID', 'Unbekannt')
                        name = shape.attrib.get('Name', 'Kein Name')
                        print(f" - Shape ID: {shape_id}, Name: {name}")

    except Exception as e:
        print(f"Fehler beim Verarbeiten der Datei: {e}")


# Beispielaufruf
datei_name = "test.vsdx"
liste_shapes_aus_vsdx(datei_name)

I only had to replace: f.startswith('pages/') -> f.startswith('visio/pages/')
And it worked, really impressive!
I agree, more difficult than COM, but doable. Not as bad as I thought.
It would be nice to have a wrapper that allows you to work on the VSDX file like you would via COM.
Easy and with IntelliSense.
I did not review the vsdx package I linked above, but that seems to be going in that direction.

Nikolay

For the other document types, Microsoft provides wrappers (Open XML SDK, something like Wordprocessing or Spreadsheets). For Visio (VSDX), there only System.IO.Packaging, but that still helps with navigating the structure of the ZIP file.

Thomas Winkel

Ah, didn't know that.
Unfortunately, the interface is very different from Microsoft.Office.Interop:
https://learn.microsoft.com/en-us/office/open-xml/spreadsheet/how-to-create-a-spreadsheet-document-by-providing-a-file-name

Interesting topic, but too many open construction sites ;D

Surrogate

#28
Quote from: Nikolay on December 11, 2024, 06:31:28 PMEvent the connections. Luckily, Visio XML has a "Connects" section, below (did not expect to see that, I thought it's only ShapeSheet formulas).
Also you can get this via PowerQuery (read more in Croc's article)

PS Thomas can you describe how I can get JSON from some active Visio document?

Thomas Winkel

The JSON export is only supported by the NuGet package.
Parsing this JSON is only supported by the Python package.

To export:
  • Create a Visio VSTO Addin
  • Install the NuGet package
  • Create a button or something else that can trigger the following code

Geradeaus.VisioJsonExport.ExportHandler exportHandler = new Geradeaus.VisioJsonExport.ExportHandler(Globals.ThisAddIn.Application.ActiveDocument);
exportHandler.Parse();
exportHandler.Export(@"C:\Temp\VisioExport.json");

I attach the JSON export of the schematic from post #10.
Then you can test this with the Python code from post #21.

Browser ID: smf (possibly_robot)
Templates: 4: index (default), Display (default), GenericControls (default), GenericControls (default).
Sub templates: 6: init, html_above, body_above, main, body_below, html_below.
Language files: 4: index+Modifications.english (default), Post.english (default), Editor.english (default), Drafts.english (default).
Style sheets: 4: index.css, attachments.css, jquery.sceditor.css, responsive.css.
Hooks called: 468 (show)
Files included: 32 - 1207KB. (show)
Memory used: 1321KB.
Tokens: post-login.
Cache hits: 14: 0.00152s for 26,543 bytes (show)
Cache misses: 4: (show)
Queries used: 21.

[Show Queries]