Jump to content
lintujuh

Tool to Check Consistency of Tags

Recommended Posts

Hi!

 

As it has been discussed in this forum (e.g. Group behaviour) that there should be an option to "force" the tags to all grouped or linked images. As the process is today manual, it's prone to errors. This started to annoy me and as I had some free time, I decided to refresh my coding skills and wrote a small Python program that will traverse the Daminion catalog and check for inconsistencies. This is (and probably will stay) as a "nerd version" – you need to be familiar with command line and installing software.

 

Install Python3 (earlier version should work as well, I have been running on 3.3) and install Psycopg2 package into Python. Now you can run from command line (or from an Python IDE like PyCharm) the attached code. Rename the file first from DamScan.txt to DamScan.py.

C:> Python DamScan.py [options]

usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-l] [-v]
                 [--version]
optional arguments:
 -h, --help            show this help message and exit
 -c DBNAME, --catalog DBNAME
                       Daminion catalog name [NetCatalog]
 -s SERVER, --server SERVER
                       Postgres server [localhost]
 -p PORT, --port PORT  Postgres server port [5432]
 -u USER, --user USER  Postgres user/password [postgres/postgres]
 -v, --verbose         verbose output
 --version             Display version information and exit.

The options should be self evident and the defaults match default Daminion configuration.

 

Currently the code checks pairwise linked items (not yet grouped) and reports, if there are differences. Program reports the differing files and for single value tags (e.g. Place) it reports both values and for multi value tags (e.g. Keywords) it reports values that are missing from the first file. The output is tab delimited so you paste it into Excel for further processing. The program is read-only; it doesn't change the database contents.

 

Currently following tags are checked:

  • Place (single)
  • GPS (single)
  • Event (single)
  • Keywords (multi)
  • Categories (multi)
  • People (multi)

An example output:

ImageA	Dir	ImageB	Tag	ValueA/Missing A		ValueB
IMG_8090.jpg	<>	IMG_8090.CR2	GPS	'44.1448N 3.09918E 256.0m'	<>	'44.1448N 3.09918E 0.0m'
IMG_4115.jpg	>	IMG_4115.CR2	Keywords	'Reflections'
IMG_1806.tif	<	IMG_1806-09.tif	Categories	'Other\Panorama'

and an interpretation

  • IMG_8090 the GPS co-ordinates (altitude) differ between JPG and CR2
  • IMG_4115 the CR2 image has keyword 'Reflections' that is missing from the JPG
  • IMG_1806.tif is missing category 'Other\Panorama' that exists in IMG_1806-09.tif

The symbols < and > just show is the relation between the images linked to or from. In both cases (even though the notation can be misleading) ImageB contains tag values that are not existing in ImageA.

 

I'm thinking of following improvements:

  • an option to do the analysis based on stacking instead of linking
  • an option to define which tags are analyzed (now all are analyzed)
  • option to list the full path and/or the Daminion ID
  • write better installation instructions

If you need some other tags to be analyzed or have improvement ideas or problems, drop a note. (GUI is not in my plans. :pardon: )

 

-Juha

 

The normal disclaimer applies that use it at your own risk, there is no warranty etc. Also I don't take any responsibility if your lover leaves you, because you are just fixing the tagging in your catalog. :gamer2:

DamScan.txt

  • Upvote 1

Share this post


Link to post
Share on other sites

Hi!

 

A slightly more detailed installation instructions.

 

After you have downloaded Python package right click the package and select "Run as administrator". In the installation dialog select Customized installation. In the customized configuration tick to include Python in the PATH and select installation for all users. Other options can be left to defaults.

 

After installation start a command window (you may need to do this also as an admin, because the Postgres support package will be installed in the Program Files directory).

C:> python -m pip install -U pip setuptools
C:> python -m pip install psycopg2

Close the elevated command window and start a normal command window. Now you can run my tool with the command (include the path where you downloaded my Python module if you are not in the same folder):

python \DIR\DamScan.py [options]

 

I have attached a new version that has the support for groups. Invoke the program with an option -g (or --group) to change the default checking based on links to check based on groups(stacks). I also changed the separator in hierarchical stacks to '|' for better readability.

 

The checking is currently limited to images and RAWs (as defined in Daminion Media Format).

 

-Juha

DamScan.txt

  • Upvote 1

Share this post


Link to post
Share on other sites

Hi Wilfried,

 

You were correct.

 

It was very straightforward to add support for the Sqlite database (= standalone) – only the call to open the database is different. For standalone version use options

-l, --sqlite            use sqlite database (standalone) instead of server
-c DB, --catalog DB     relative pathname of the local catalog (with .dmc)

Example command line assuming you have the Python code in your home directory and Daminion catalog at Pictures

C:\Users\user> python DamScan.py -l -c Pictures\DaminionCatalog.dmc

The new version is attached.

 

-Juha

 

PS. I had only very limited test data for the local catalog.

DamScan.txt

Share this post


Link to post
Share on other sites
As a reference, I have 20.500 items in my active catalog and I got roughly 1.600 messages

Just for a quick estimate: How much time did you need to scan those 20,500 items? After some little hurdles, I got it to work and currently for 156,069 elements to be scanned .... boredom.gif

 

It seems to me, if you do not have any linked items and forget to specify -g, you will get theses error messages:

 

C:\Users\User>python DamScan.py -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
Traceback (most recent call last):
 File "DamScan.py", line 392, in 
   main()
 File "DamScan.py", line 386, in main
   catalog.ScanCatalog()
 File "DamScan.py", line 320, in ScanCatalog
   while self.NextImage():
 File "DamScan.py", line 312, in NextImage
   self.__image = DamImage(self, row[0])
 File "DamScan.py", line 70, in __init__
   self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable

 

The first and so far only mismatch is this:

 

C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
030_27.jpg      <   	120803_8608_WBL_A55.JPG Categories      'Urlaub'

While the finding is correct for "030_27.jpg", the image "120803_8608_WBL_A55.JPG" is not member of any group. However, I cannot exclude, that was in the past. One thought: While most of my file names are unique (but can have pairs withe same name in different folders), some older ones, such as 030_27.jpg are not. Could this possibly confuse your program?

 

While I was writing this, the scan ended, but I am not sure, it really scanned the entire database, since the result is this:

C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
030_27.jpg      <   	120803_8608_WBL_A55.JPG Categories      'Urlaub'
121229_0839_WBL_A55.JPG <   	160901_0769_WBL_A77.JPG Keywords        'Via Loreto'
121229_0839_WBL_A55.JPG <   	160901_0769_WBL_A77.JPG Categories      'Kurioses'
Traceback (most recent call last):
 File "DamScan.py", line 392, in 
   main()
 File "DamScan.py", line 386, in main
   catalog.ScanCatalog()
 File "DamScan.py", line 324, in ScanCatalog
   FromList = self.__image.LinkedFrom()
 File "DamScan.py", line 170, in LinkedFrom
   return self.__bottomItems()
 File "DamScan.py", line 157, in __bottomItems
   img = DamImage(self.__db, r[0])
 File "DamScan.py", line 70, in __init__
   self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable

 

Similar to the first result above, ImageA shows a correct mismatch, ImageB in the same line is not related to it.

 

The third finding in this example shows the same pair of file names as the second, but Categories 'Kurioses' does not appear in any of those and apparently belongs to a completely different pair.

 

My intension is to find all grouped items (not only those with mismatching tags) and I am hoping for a small modification of the code to do that.

 

I any case, thanks a lot for your effort, Juha.

Share this post


Link to post
Share on other sites

Hi Wilfried and thank you for your comments,

 

I need to look for your cases why the program terminates abnormally even -g option specified. I have sent you a PM to debug the issues.

 

Daminion database has some ghost entries from deleted files, but those entries should be flagged as deleted and my program ignores those entries.

 

With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags.

 

-Juha

Share this post


Link to post
Share on other sites

Thanks Juha, response to PM ist on the way ...

With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags.

 

I also tried the verbose and found out, it crashes with item 49530. Possibly a counter overflow?

 

48385 (49529, '120716_7968_WBL_A55.JPG', 0)
48386 (49530, '120716_7969_WBL_A55.JPG', 0)
Traceback (most recent call last):
 File "DamScan.py", line 392, in 
   main()
 File "DamScan.py", line 386, in main
   catalog.ScanCatalog()
 File "DamScan.py", line 324, in ScanCatalog
   FromList = self.__image.LinkedFrom()
 File "DamScan.py", line 170, in LinkedFrom
   return self.__bottomItems()
 File "DamScan.py", line 157, in __bottomItems
   img = DamImage(self.__db, r[0])
 File "DamScan.py", line 70, in __init__
   self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable

Share this post


Link to post
Share on other sites

Hi!

 

It looks more like a memory problem, because I got up to 76559 items before crash. I'll take a look at this.

 

-Juha

Share this post


Link to post
Share on other sites

Certainly something specific to each computer, but I should have plenty of memory (8GB on Windows 10 pro 64bit) which is never completely used. I was watching the memory usage of Python and never went beyond 11MB, if I remember correctly.

 

Is there any option to limit memory usage? Some environment variable or something similar?

Share this post


Link to post
Share on other sites
... With -v/--verbose option will print all items, but unfortunately it prints a line for each tag ....

 

Juha, I suggest the following change at line 311 to make the verbose option more useful:

 

print("\r", self.__counter, row, end="")

 

Even though each iteration will be printed, each line overwrites the previous one. The two additional parameters are "\r" => carriage return; end="" => no line feed at end of line. That way you will see only the running counts and file names without the screen to be filled and rolling up.

Share this post


Link to post
Share on other sites

Thank you Wilfried for helping to debug the program. This is now a release, what I can call a beta release.

 

The updated options are

usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-g] [-l]
                 [-f] [-i] [-v] [--version]

Search inconcistent tags from a Daminion database.

optional arguments:
 -h, --help            show this help message and exit
 -c DBNAME, --catalog DBNAME
                       Daminion catalog name [NetCatalog]
 -s SERVER, --server SERVER
                       Postgres server [localhost]
 -p PORT, --port PORT  Postgres server port [5432]
 -u USER, --user USER  Postgres user/password [postgres/postgres]
 -g, --group           Use groups/stacks instead of image links
 -l, --sqlite          Use Sqlite (= standalone) instead of Postgresql (=server)
 -f, --fullpath        Print full directory path and not just file name
 -i, --id              Print database id after the filename
 -v, --verbose         verbose output
 --version             Display version information and exit.

If you are using the standalone version, don't forget to add .dmc to the catalog name. Use format -c=... if you are not in the same folder as the catalog.

python PycharmProjects\DamScan\DamScan.py -v -l -g -c="Pictures\test - Copy.dmc"

If you have questions or comments, write to the forum or send a PM.

-Juha

DamScan.txt

Share this post


Link to post
Share on other sites

Daminion also allows you to link or group associated items together, but there are no built-in tools for checking the consistency of the meta data for the linked or grouped items. DamScan.py solves this problem and reports inconsistencies in metadata for Daminion server and standalone catalogs.

 

Great thanks to Wilfried and Uwe for testing my program and commenting the documentation. Here is what I can call the first official version of the program. You need to rename DamScan.txt to DamScan.py after downloading. See detailed instructions and options in the manual. The program is also available from GitHub.

 

-Juha

DamScan.txt

DamScan.pdf

  • Upvote 1

Share this post


Link to post
Share on other sites

Hi!

 

When importing the output file into Excel, you have to select in import wizard at Step 1 File origin: 65001 : Unicode (UTF-8). This will import the accented and diacritic letters correctly.

 

There is also an updated version in Github. It doesn't contain any new features, just bug fixes to few exceptional cases.

 

-Juha

Share this post


Link to post
Share on other sites

Hi!

 

A new release for those who have been using the tool and who is doesn't want "valid" differences between grouped or linked items to be reported.

 

Quote from the documentation

The option -a specifies a configuration file that contains acknowledged differences between linked or grouped media items. The differences listed in this file are excluded from the output.

 

The new version and the updated documentation are available from Github.

 

-Juha

Share this post


Link to post
Share on other sites

Hi!

 

A new version of the tool is available in GitHub. Now it's possible to save the parameters in an INI file. Collections can now also be compared and then there are some bug fixes and performance improvements.

 

-Juha

Share this post


Link to post
Share on other sites

Juha,

I tested your DamCompare.py (download from GitHub) with the following command

python DamCompare.py -l -c1 oldcat.dmc -c2 newcat.dmc. The new catalog I have previously created. In the result, all files were missing.

What did I do wrong ?

DSCF2665.jpg    <>    –    ERROR: file missing

Share this post


Link to post
Share on other sites

Hi Egon and sorry for the delay.

The error message says that it finds DSCF2665.jpg in oldcat, but doesn't find it in newcat. You can try to add flag -f (means print full path) to your command line, if that gives you any hint. The program matches images in the catalogs with the full directory path and filename.

Ask more, if this doesn't help.

-Juha

Share this post


Link to post
Share on other sites

Hello Juha.

thank you for your answer.

I added -f flag.. The result was the same. (except, before the file name was the path)
All pictures would not be available in the new catalog.

I understood that your script imported the pictures into the new, empty catalog. Is that correct?

 

Egon

Share this post


Link to post
Share on other sites
12 minutes ago, Egon said:

I understood that your script imported the pictures into the new, empty catalog. Is that correct?

No, this is not correct, Egon. You need to create an empty catalog (correct), but you also should import all pictures there before running the script.

Share this post


Link to post
Share on other sites

Thank you Wilfried,

Exactly as Wilfried said, the tool just compares two existing catalogs with each other, it doesn't import any images.

 

Juha

Share this post


Link to post
Share on other sites

Thank you Wilfried and Juha,

I have exported all the files of a catalog to csv, created a new catalog and imported the csv file.
Then I executed the following command:

python DamCompare.py -f -l -c1 "C:\Users\wir\Documents\Daminion Catalogs\DaminionCatalog_ganz neu.dmc" -c2 "C:\Users\wir\Documents\Daminion Catalogs\CatNeu.dmc" -o out .txt

There were many errors related to the import.
But a mistake has in my view nothing to do with the import.
Many files recognize the creation date as incorrect.
But it is identical to the old catalog in the new catalog.

Errormessage:
D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg    <>    D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg    Creation Time

csv file:
FilePath;Item Id;Rating;Label;People;Place;Event;Categories;Keywords;Project;Client;Authors;Copyrights;Scene;Subject Code;Intellectual Genre;Creation Datetime;...
...
D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg;1100;;;;Österreich\Wien\\Wien;;Autos\C, Autos\C\Chrysler, Autos\C\Chrysler\Crossfire, Autos;;Chrysler Crossfire;;;;;;;18.12.2009 17:48:43;...
...

What is the real difference?

 

Greetings

EgonDamCompare.jpg.6eb8863e4288ad759e9ad7d9b1a8a89b.jpg

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...