News:

Happy New Year!

Main Menu

Visio JSON Exort

Started by Thomas Winkel, November 22, 2024, 03:17:29 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Thomas Winkel

Recently I was asked about "machine readability" of Visio documents. A customer wants to process information from our Visio documents using Python:
Shape data, follow connections, etc..
For some reason COM Interop is out of the question for him. I showed him that Visio documents are zipped XML structures and suggested him to try that. However, I suspect that this will be quite difficult. (Does anyone here know anything about that?)

The easiest way would probably be to provide a JSON export with all the necessary information.
Such an export could be similar to the Visio object model (Document -> Pages -> Shapes -> Cells).
It should be as generic as possible so that it can be used elsewhere.
Here is a first draft:
https://github.com/ThomasWinkel/VisioJsonExport/blob/master/VisioModel.cs

What do you think about this?
Should I continue, or is the XML approach better?

Yacine

This goes clearly towards AI. I struggled with the issue too.
My approach was focused more on analysing the drawing - geometry + data.

Going the XML path is the brute force approach. You'll be left with a huge tree of all kind of objects with different levels of depth.
The question will then be how legible this structure will be for an AI.

Choosing between XML and JSON does not make much of a difference. They're quite interchangeable.
Yacine

Nikolay

#2
Here is some sample file how I'm working with VSDX files directly:
https://github.com/nbelyh/VisioWebTools/blob/master/visiowebtools/SplitPagesService.cs

So technically it's possible, just no fun, unfortunately

The ShapeSheet formulas is a big problem though, just to calculate X, Y, Width, Height properly,
one would need a full-blown formula calculation engine built, basically

In the modern world, MERMAID language is the way to go.
It's integrated with any tool or AI you can think of (wikis, github, azure, etc), and it's clear text.
Probably that's what they are after.

https://www.mermaidchart.com/

Thomas Winkel

How could AI help here?
With "XML approach" I mean unzip the vsdx file and analyse the original xml documents.
I never did that and I have no idea about the structure and if there are any libraries, examples, documentation that would help.
For our use case I would only export some data like user fields, properties, connections, ...
Common ShapeSheet cells such as colors, positions, etc. are not required.
But, of course other use cases could also require this information.
Simply exporting everything could be very slow.

Thomas Winkel

Thank you, Nikolay. I feared that.
I hadn't even considered that the XMLs only contain formulas but no results...
This makes this approach completely useless for me.
Thanks for the links, I'll look at that later this weekend.

Nikolay

#5
If you only consider top-level shapes, that could work, as formulas basically just values for those.

Parsing it in general is not THAT bad, but it could definitely help if something like `WordprocessingDocument` existed for Visio. It looks all manual as of now, with non-trivial namespaces.

The mermaid is a widely accepted approach in the community. Not sure about your situation, but if they need to draw some diagrams with AI, this is the way to go, forget about Visio.

Nikolay

#6
BTW, my SvgPublish can export JSON with that information (see an example below),
but it's obviously a Visio plug-in so it requires Visio to run (and uses Visio COM API)

Yacine

@Thomas, When I said AI, I meant that the result is most probably to be fed to an AI.
Exporting the whole xml isn't slow. But it would export too much data and clutter the AI.
I had a look at Nikolay's JSON file, it looks just perfect as it collects all the shapes and the connections. This is probably exactly what your client wants.

@Nikolay, I don't understand what you mean with Mermaid. It generates diagrams from data. How is that useful for reading Visio files?
Yacine

Nikolay

#8
I mean mermaid as a language used to create diagrams, that is now embedded in most places (like GitHub for example)

It is not useful one bit for reading Visio files, of course ;D

I just thought maybe Visio can be replaced by it, but it depends on the project. If Visio diagrams are already there (?) and need to be analyzed by AI to get something out of them, then obviously some processing tool needs to be created. But in case the diagrams don't exist and are to be generated, then GPT for example can make mermaid diagrams easily by description.

This is the list of text file types supported by current GPT platform, and Visio is not one of them:
File types supported: .c, .cs, .cpp, .doc, .docx, .html, .java, .json, .md, .pdf, .php, .pptx, .py, .rb, .tex, .txt, .css, .js, .sh, .ts

https://platform.openai.com/docs/assistants/tools/file-search/supported-files

Yacine

Yes, now were are talking.

Several weeks ago I tried actually to let an AI understand my drawings, only to learn that this is not obvious.

These AI companies concentrate on getting AIs to understand text.
Somewhat later they understood that plain text isn't enough. Users want to upload Word, Excel, PDF, and what no more of file formats to communicate with the AI. Visio is obviously not one of the primary formats.

This prioritizing is interesting to observe in so far as in my opinion the AIs (GPT 4o in my case) are intelligent enough to handle our main needs, but lack primitive exchange tools. Considering what these companies invest in matters of effort in the "intelligence" of the their systems, I cannot really understand why these much simpler converting tasks of translating one file format in another is so difficult. Wrong priorities in my opinion.
Yacine

Thomas Winkel

Time for an example 8)

We create schematics like that:
You cannot view this attachment.
We can generate several exports for our internal work: wire list, parts list, signals & configurations, ...
Now, a customer also needs some data for their internal tooling.
But they cannot yet clearly define what exactly they need because their tool is in an early stage.
It is clear that they need some shape data as well as the connections.
For the connections its important to know the connection points (e.g. relay n.o. contact connects to the signal pin of the analog input).
That's easy with VBA / VSTO using COM.
But for some reason this is not on option.
Maybe they do not have the knowledge or their tool runs on Linux machines, I don't know and I don't care.

For the moment they will evaluate to parse the vsdx files.
But I also have the idea to define some generic JSON export with all relevant data.
I guess this will be much easier for them.
Maybe this could also be interesting for other scenarios.

I attached the current state of the JSON for the example schematic above.

GitHub:
https://github.com/ThomasWinkel/VisioJsonExport

NuGet Package:
https://www.nuget.org/packages/Geradeaus.VisioJsonExport

Thomas Winkel

I just realized that this is a bit confusing... :o

My company will not support the customer with their tool development because we are afraid that this will end in a never ending story.
And the customer will not do any Visio programming.
So I suggested to parse the vsdx files directly.
The generic JSON export is my personal effort because I find the approach interesting.

In fact all our exports I mentioned above could be created from The JSON without Visio. That's why I think this could be a powerful interface format for many use cases.

But that's just the political background for a better understanding:
Export using COM would be OK, but the customer doesn't do it, my company doesn't want it, and privately I only do what piques my interest ;D

Nikolay

#12
I would try to figure out why are they against Visio API, if they are using Visio anyway (to create/edit those diagrams)?
Looks really odd. Could be some misunderstanding?

Thomas Winkel

They do not use Visio.
We use Visio for the documentation of our systems (see schematic above).

Regarding your JSON:
What does sid / cid mean?
You do not export user cells or shape properties, right?

wapperdude

Are they driving gnd machines?  According to Google, file formats are
QuoteCNC machines use a variety of file formats, including:
G-code: A text file format that contains instructions for controlling the CNC machine's movements. G-code files are written in a standardized language called geometric code (G-code). The most common file extension for G-code is *.nc, but other extensions include *.cnc, *.ngc, *.gcode, or *.tap.
STL: A 3D file format that uses triangular mesh geometry to represent a model. However, STL files are not suitable for CNC machining because they can't represent advanced geometric features.
DXF: A 2D file format that's often used for laser cutting.
DWG: A proprietary CAD format owned by AutoCAD that can be used in both 2D and 3D.
STEP: A 3D file format with the extension .stp or .step.
IGES: An interchange file format with the extension .igs or .iges. IGES files are compatible with almost all CAD software.

JSON or Excel might be intermediate formats that they can read and use...just a thought. 
Visio 2019 Pro

Browser ID: smf (possibly_robot)
Templates: 4: index (default), Display (default), GenericControls (default), GenericControls (default).
Sub templates: 6: init, html_above, body_above, main, body_below, html_below.
Language files: 4: index+Modifications.english (default), Post.english (default), Editor.english (default), Drafts.english (default).
Style sheets: 4: index.css, attachments.css, jquery.sceditor.css, responsive.css.
Hooks called: 423 (show)
Files included: 32 - 1207KB. (show)
Memory used: 1282KB.
Tokens: post-login.
Cache hits: 14: 0.00245s for 26,553 bytes (show)
Cache misses: 3: (show)
Queries used: 17.

[Show Queries]