News:

BB code in posts seems to be working again!
I haven't turned on every single tag, so please let me know if there are any that are used/needed but not activated.

Main Menu

Automating task in Python

Started by srkrocks31, July 02, 2020, 05:24:02 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

srkrocks31

hello,

i need help in automating a task in .vsd visio file using python win32com library.

So i have a .vsd file and i want to remove/delete all the text in it which are strikethrough and also convert the all the purple text into black. 

if anyone can help then i would be really greatful.

thanks !

Yacine

#1
@srkrocks31,
You should have started a new topic. Your question has nothing to do with the initial one.

Anyway, I worked it out.

The difficulty was to identify the sections of text formatted differently. It took me quite a while to find an example. https://www.office-forums.com/threads/getting-shapesheets-character-section-info-in-vb.1616220/
The trick is to iterate over the whole text, then use the RunEnd and RunBegin functions to get the boundaries of the respective text element.
The rest is a simple matter of shapesheet manipulation.

You asked for Python, I worked out a JupyterLab notebook, I hope you can handle it.
Yacine

Paul Herber

[Moderator] I've split this off as a new topic.
Electronic and Electrical engineering, business and software stencils for Visio -

https://www.paulherber.co.uk/

wapperdude

#3
Any reason for choosing Python over VBA?  This is doable in the latter, too.  See http://visguy.com/vgforum/index.php?topic=9067.msg40123#msg40123
Visio 2019 Pro

Yacine

#4
Hello Wayne,

The reasons to do it in Python:
- It's fun
- The prototyping is much faster and elegant
- Because I can. :D

But that is not the interesting part of this topic.

Almost all the solutions available out there handle the CREATION of formats in a shape's text. They describe how to select a part of the text depending on its content or position (your example as well).

None of them answer the question of how to MODIFY already formatted text.

The information you get ad hoc with VBA are
- the raw text without formats
- the number of differently formatted sections and how they are formatted
but you don't get the position of these sections.

The shapesheet shows you their begin and length but there is no way to access them - at least I could not.

And whilst the modification of a format could be done by modifying a row based on a certain criterion, deleting a part of the text with this method is impossible (Due to the way these sections are modeled - start/length).

So you really need to get access to these positions.

I found only one example demonstrating how to do it: https://www.office-forums.com/threads/getting-shapesheets-character-section-info-in-vb.1616220/

My python interpretation:
You'll notice that while I needed to define certain variables that I wouldn't have to in VBA, the loop itself is much smaller than the VBA pendant. That's really only my personal perception  ;) .

def get_char_rows(shp):
   
    if shp is None:
        return
   
    char_props = [['Font', 0],
                    ['Color', 1],
                    ['Style', 2],
                    ['Case', 3],
                    ['Pos', 4],
                    ['FontScale', 5],
                    ['Size', 7],
                    ['DblUnderline', 8],
                    ['Overline', 9],
                    ['Strikethru', 10],
                    ['DoubleStrikethrough', 13],
                    ['Letterspace', 16],
                    ['ColorTrans', 17], ]
   
    visSectionCharacter = 3
   
    visCharacterColor = 1
    visCharacterStrikethru = 10
   
    chars = shp.Characters
    num_chars = chars.CharCount
   
    i = 0
    L = []
    while i+1 < num_chars:
        chars.Begin = i
        chars.End = i+1

        int_run_end = chars.RunEnd(1)
        int_run_begin = chars.RunBegin(1)

        chars.End = int_run_end
        chars.Begin = int_run_begin

        i = int_run_end

        dict_ = {}
        dict_['row'] = chars.CharPropsRow(1)
        dict_['char_count'] = chars.CharCount
        dict_['begin'] = int_run_begin
        dict_['end'] = int_run_end
       
        for item in char_props:
            formula = shp.CellsSRC(visSectionCharacter, dict_['row'], item[1]).FormulaU
            res = shp.CellsSRC(visSectionCharacter, dict_['row'], item[1]).ResultStr("")
            dict_[item[0]] = [formula, res]
        dict_['text'] = chars.Text
        L.append(dict_)
    return L


Trying to put it in words:
Starting with the first letter in the text, you ask for the begin and the end of the current section (shapesheet row) by using the functions RunBegin and RunEnd. Then you move your index one past the end of this section and repeat until the end of the entire text.

... and this is the result the function returns. Note how simply Python allowed to extract every format aspect as well as the text chunks. The list of dictionaries allows also for a very elegant access to these data.
If L would be the list generated, the complete row #3 would be L[2] (zero based).
The formula of the font cell would be L[2]["Font"][0]
and the ResultStr("") would be L2["Font"][1] ... etc.

[{'row': 0,
  'char_count': 25,
  'begin': 0,
  'end': 25,
  'Font': ['THEMEVAL()', '4'],
  'Color': ['THEMEVAL()', '0'],
  'Style': ['THEMEVAL()', '0'],
  'Case': ['0', '0'],
  'Pos': ['0', '0'],
  'FontScale': ['100%', '100,0000%'],
  'Size': ['10 pt', '10,0000 pt'],
  'DblUnderline': ['FALSE', 'FALSE'],
  'Overline': ['FALSE', 'FALSE'],
  'Strikethru': ['FALSE', 'FALSE'],
  'DoubleStrikethrough': ['FALSE', 'FALSE'],
  'Letterspace': ['0 pt', '0,0000 pt'],
  'ColorTrans': ['0%', '0,0000%'],
  'text': 'After an acquaintance of '},
{'row': 1,
  'char_count': 7,
  'begin': 25,
  'end': 32,
  'Font': ['THEMEVAL()', '4'],
  'Color': ['THEMEVAL()', '0'],
  'Style': ['THEMEVAL()', '0'],
  'Case': ['0', '0'],
  'Pos': ['0', '0'],
  'FontScale': ['100%', '100,0000%'],
  'Size': ['10 pt', '10,0000 pt'],
  'DblUnderline': ['FALSE', 'FALSE'],
  'Overline': ['FALSE', 'FALSE'],
  'Strikethru': ['TRUE', 'TRUE'],
  'DoubleStrikethrough': ['FALSE', 'FALSE'],
  'Letterspace': ['0 pt', '0,0000 pt'],
  'ColorTrans': ['0%', '0,0000%'],
  'text': 'nearly '},...



As of the operations to apply, you can either do them in the loop itself, or store first the results in an adequate structure and do the manipulations afterwards (my preferred way). In VBA you would probably choose an array, in Python I opted for a list of dictionaries.
Get the list of rows:
shp = vWin.Selection[0]
L = get_char_rows(shp)


Modify the color:
Note that I used the ResultStr instead of the formula to overcome themed formulas (THEMEGUARD(RGB,...)
for row in L:
    if row['Color'][1] == 'RGB(112; 48; 160)':
        shp.CellsSRC(3, row['row'], 1).FormulaU = '0'

       


Deleting chunks of text is not complicated, but one has to keep in mind that deleting such a chunk will modify all the data of characters section following it. So the deletion must be done backwards.

chars = shp.Characters
for row in reversed(L):
    if row['Strikethru'][1] == 'TRUE':
        chars.begin = row['begin']
        chars.end = row['end']
        chars.cut


So far for my explanations, I hope they are reasonably understandable.

... and not to forget: the same result can be obtained with plain VBA, it just happened that a Python solution was requested.
Yacine

wapperdude

@Yacine:  Phew!  Long reply.  Not read it in detail...still working on 1st cup of coffee.  Going to be a busy day.

Anyway, I disagree with VBA's lack of formatting capabilities.  Did you check out the link I provided?

Elegance???  Perhaps an issue.  Perhaps not.

For me, one language vs another is just a new pain.  Each has good/bad points.  I know VBA because had no better option.  It meets my needs.  So, no motivation to learn new.  We've had this discussion before.  ☺

I was wondering if the OP generally used Python,  or just for this case? 

Now, I'm going to finish my coffee and snuggle up with my pet T-rex.  🤔😯😬
Visio 2019 Pro

vojo

my 2 cents

VBA has evolved to the point that it is pretty much same as python (number keystrokes to do something aside).

I see 2 big differences
1)  to do something in VBA has way way way way way too many keystrokes
     VBA is like  y=this.that.this.that.this.that.this.that.this.that.this.that.x
2)  there seems to be inconsistencies around using "set" vs "=" vs "dim"
     I guess I don't know well when to use which.

Similarities
- global and local variables
- inclusion of other "modules" to prevent 1 big monolithic file
- OO   VBA with "me"   python with "self"
- functions and functional controls...if then, case, etc...VBA has "case" but there are python modules to do same

what would be cool is a python module to wrap Visio VBA API...something like
class visio_API(object):
def create_visio_session (blah, blah, blah):    // do all that win32 stuff
def create_square (width, height, X, Y):
def set_geo_row (x, y, geo_num, whape):
def get_geo_row(x, y, geo_num, shape):
etc.

I am sure there syntax needs work below...but the idea is to make it easier for python <=> visio interfacing
VVV = visio_API()
sq1=VVV.create_square(20mm,30mm,100mm,100mm)
sq2=VVV.create_square(30mm,20mm,140mm, 140mm)
set_geo_row(1,1,1,sq1) = 25mm, 35mm
etc

Browser ID: smf (possibly_robot)
Templates: 4: index (default), Display (default), GenericControls (default), GenericControls (default).
Sub templates: 6: init, html_above, body_above, main, body_below, html_below.
Language files: 4: index+Modifications.english (default), Post.english (default), Editor.english (default), Drafts.english (default).
Style sheets: 4: index.css, attachments.css, jquery.sceditor.css, responsive.css.
Hooks called: 272 (show)
Files included: 34 - 1306KB. (show)
Memory used: 1202KB.
Tokens: post-login.
Cache hits: 14: 0.00207s for 26,729 bytes (show)
Cache misses: 4: (show)
Queries used: 16.

[Show Queries]