lintujuh Posted July 19, 2017 Report Share Posted July 19, 2017 Hi! As it has been discussed in this forum (e.g. Group behaviour) that there should be an option to "force" the tags to all grouped or linked images. As the process is today manual, it's prone to errors. This started to annoy me and as I had some free time, I decided to refresh my coding skills and wrote a small Python program that will traverse the Daminion catalog and check for inconsistencies. This is (and probably will stay) as a "nerd version" – you need to be familiar with command line and installing software. Install Python3 (earlier version should work as well, I have been running on 3.3) and install Psycopg2 package into Python. Now you can run from command line (or from an Python IDE like PyCharm) the attached code. Rename the file first from DamScan.txt to DamScan.py. C:> Python DamScan.py [options] usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-l] [-v] [--version] optional arguments: -h, --help show this help message and exit -c DBNAME, --catalog DBNAME Daminion catalog name [NetCatalog] -s SERVER, --server SERVER Postgres server [localhost] -p PORT, --port PORT Postgres server port [5432] -u USER, --user USER Postgres user/password [postgres/postgres] -v, --verbose verbose output --version Display version information and exit. The options should be self evident and the defaults match default Daminion configuration. Currently the code checks pairwise linked items (not yet grouped) and reports, if there are differences. Program reports the differing files and for single value tags (e.g. Place) it reports both values and for multi value tags (e.g. Keywords) it reports values that are missing from the first file. The output is tab delimited so you paste it into Excel for further processing. The program is read-only; it doesn't change the database contents. Currently following tags are checked: Place (single) GPS (single) Event (single) Keywords (multi) Categories (multi) People (multi) An example output: ImageA Dir ImageB Tag ValueA/Missing A ValueB IMG_8090.jpg <> IMG_8090.CR2 GPS '44.1448N 3.09918E 256.0m' <> '44.1448N 3.09918E 0.0m' IMG_4115.jpg > IMG_4115.CR2 Keywords 'Reflections' IMG_1806.tif < IMG_1806-09.tif Categories 'Other\Panorama' and an interpretation IMG_8090 the GPS co-ordinates (altitude) differ between JPG and CR2 IMG_4115 the CR2 image has keyword 'Reflections' that is missing from the JPG IMG_1806.tif is missing category 'Other\Panorama' that exists in IMG_1806-09.tif The symbols < and > just show is the relation between the images linked to or from. In both cases (even though the notation can be misleading) ImageB contains tag values that are not existing in ImageA. I'm thinking of following improvements: an option to do the analysis based on stacking instead of linking an option to define which tags are analyzed (now all are analyzed) option to list the full path and/or the Daminion ID write better installation instructions If you need some other tags to be analyzed or have improvement ideas or problems, drop a note. (GUI is not in my plans. ) -Juha The normal disclaimer applies that use it at your own risk, there is no warranty etc. Also I don't take any responsibility if your lover leaves you, because you are just fixing the tagging in your catalog. DamScan.txt 1 Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 19, 2017 Author Report Share Posted July 19, 2017 As a reference, I have 20.500 items in my active catalog and I got roughly 1.600 messages. -Juha Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 20, 2017 Author Report Share Posted July 20, 2017 Hi! A slightly more detailed installation instructions. After you have downloaded Python package right click the package and select "Run as administrator". In the installation dialog select Customized installation. In the customized configuration tick to include Python in the PATH and select installation for all users. Other options can be left to defaults. After installation start a command window (you may need to do this also as an admin, because the Postgres support package will be installed in the Program Files directory). C:> python -m pip install -U pip setuptools C:> python -m pip install psycopg2 Close the elevated command window and start a normal command window. Now you can run my tool with the command (include the path where you downloaded my Python module if you are not in the same folder): python \DIR\DamScan.py [options] I have attached a new version that has the support for groups. Invoke the program with an option -g (or --group) to change the default checking based on links to check based on groups(stacks). I also changed the separator in hierarchical stacks to '|' for better readability. The checking is currently limited to images and RAWs (as defined in Daminion Media Format). -Juha DamScan.txt 1 Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 22, 2017 Report Share Posted July 22, 2017 Great idea Juha! I assume, it works only for the server version of Damion, since stand-alone uses a different database, correct? Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 22, 2017 Author Report Share Posted July 22, 2017 Hi Wilfried, You were correct. It was very straightforward to add support for the Sqlite database (= standalone) – only the call to open the database is different. For standalone version use options -l, --sqlite use sqlite database (standalone) instead of server -c DB, --catalog DB relative pathname of the local catalog (with .dmc) Example command line assuming you have the Python code in your home directory and Daminion catalog at Pictures C:\Users\user> python DamScan.py -l -c Pictures\DaminionCatalog.dmc The new version is attached. -Juha PS. I had only very limited test data for the local catalog. DamScan.txt Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 22, 2017 Report Share Posted July 22, 2017 Thanks a lot Juha. I will give it a try, when time allows. Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 23, 2017 Report Share Posted July 23, 2017 As a reference, I have 20.500 items in my active catalog and I got roughly 1.600 messages Just for a quick estimate: How much time did you need to scan those 20,500 items? After some little hurdles, I got it to work and currently for 156,069 elements to be scanned .... It seems to me, if you do not have any linked items and forget to specify -g, you will get theses error messages: C:\Users\User>python DamScan.py -l -c Pictures\DaminionCatalogAlle.dmc ImageA Dir ImageB Tag ValueA/Missing A ValueB Traceback (most recent call last): File "DamScan.py", line 392, in main() File "DamScan.py", line 386, in main catalog.ScanCatalog() File "DamScan.py", line 320, in ScanCatalog while self.NextImage(): File "DamScan.py", line 312, in NextImage self.__image = DamImage(self, row[0]) File "DamScan.py", line 70, in __init__ self.ImageName = row[0] TypeError: 'NoneType' object is not subscriptable The first and so far only mismatch is this: C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc ImageA Dir ImageB Tag ValueA/Missing A ValueB 030_27.jpg < 120803_8608_WBL_A55.JPG Categories 'Urlaub' While the finding is correct for "030_27.jpg", the image "120803_8608_WBL_A55.JPG" is not member of any group. However, I cannot exclude, that was in the past. One thought: While most of my file names are unique (but can have pairs withe same name in different folders), some older ones, such as 030_27.jpg are not. Could this possibly confuse your program? While I was writing this, the scan ended, but I am not sure, it really scanned the entire database, since the result is this: C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc ImageA Dir ImageB Tag ValueA/Missing A ValueB 030_27.jpg < 120803_8608_WBL_A55.JPG Categories 'Urlaub' 121229_0839_WBL_A55.JPG < 160901_0769_WBL_A77.JPG Keywords 'Via Loreto' 121229_0839_WBL_A55.JPG < 160901_0769_WBL_A77.JPG Categories 'Kurioses' Traceback (most recent call last): File "DamScan.py", line 392, in main() File "DamScan.py", line 386, in main catalog.ScanCatalog() File "DamScan.py", line 324, in ScanCatalog FromList = self.__image.LinkedFrom() File "DamScan.py", line 170, in LinkedFrom return self.__bottomItems() File "DamScan.py", line 157, in __bottomItems img = DamImage(self.__db, r[0]) File "DamScan.py", line 70, in __init__ self.ImageName = row[0] TypeError: 'NoneType' object is not subscriptable Similar to the first result above, ImageA shows a correct mismatch, ImageB in the same line is not related to it. The third finding in this example shows the same pair of file names as the second, but Categories 'Kurioses' does not appear in any of those and apparently belongs to a completely different pair. My intension is to find all grouped items (not only those with mismatching tags) and I am hoping for a small modification of the code to do that. I any case, thanks a lot for your effort, Juha. Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 23, 2017 Author Report Share Posted July 23, 2017 Hi Wilfried and thank you for your comments, I need to look for your cases why the program terminates abnormally even -g option specified. I have sent you a PM to debug the issues. Daminion database has some ghost entries from deleted files, but those entries should be flagged as deleted and my program ignores those entries. With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags. -Juha Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 23, 2017 Report Share Posted July 23, 2017 Thanks Juha, response to PM ist on the way ... With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags. I also tried the verbose and found out, it crashes with item 49530. Possibly a counter overflow? 48385 (49529, '120716_7968_WBL_A55.JPG', 0) 48386 (49530, '120716_7969_WBL_A55.JPG', 0) Traceback (most recent call last): File "DamScan.py", line 392, in main() File "DamScan.py", line 386, in main catalog.ScanCatalog() File "DamScan.py", line 324, in ScanCatalog FromList = self.__image.LinkedFrom() File "DamScan.py", line 170, in LinkedFrom return self.__bottomItems() File "DamScan.py", line 157, in __bottomItems img = DamImage(self.__db, r[0]) File "DamScan.py", line 70, in __init__ self.ImageName = row[0] TypeError: 'NoneType' object is not subscriptable Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 23, 2017 Author Report Share Posted July 23, 2017 Hi! It looks more like a memory problem, because I got up to 76559 items before crash. I'll take a look at this. -Juha Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 23, 2017 Report Share Posted July 23, 2017 Certainly something specific to each computer, but I should have plenty of memory (8GB on Windows 10 pro 64bit) which is never completely used. I was watching the memory usage of Python and never went beyond 11MB, if I remember correctly. Is there any option to limit memory usage? Some environment variable or something similar? Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted July 24, 2017 Report Share Posted July 24, 2017 ... With -v/--verbose option will print all items, but unfortunately it prints a line for each tag .... Juha, I suggest the following change at line 311 to make the verbose option more useful: print("\r", self.__counter, row, end="") Even though each iteration will be printed, each line overwrites the previous one. The two additional parameters are "\r" => carriage return; end="" => no line feed at end of line. That way you will see only the running counts and file names without the screen to be filled and rolling up. Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 25, 2017 Author Report Share Posted July 25, 2017 Thank you Wilfried for helping to debug the program. This is now a release, what I can call a beta release. The updated options are usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-g] [-l] [-f] [-i] [-v] [--version] Search inconcistent tags from a Daminion database. optional arguments: -h, --help show this help message and exit -c DBNAME, --catalog DBNAME Daminion catalog name [NetCatalog] -s SERVER, --server SERVER Postgres server [localhost] -p PORT, --port PORT Postgres server port [5432] -u USER, --user USER Postgres user/password [postgres/postgres] -g, --group Use groups/stacks instead of image links -l, --sqlite Use Sqlite (= standalone) instead of Postgresql (=server) -f, --fullpath Print full directory path and not just file name -i, --id Print database id after the filename -v, --verbose verbose output --version Display version information and exit. If you are using the standalone version, don't forget to add .dmc to the catalog name. Use format -c=... if you are not in the same folder as the catalog. python PycharmProjects\DamScan\DamScan.py -v -l -g -c="Pictures\test - Copy.dmc" If you have questions or comments, write to the forum or send a PM. -Juha DamScan.txt Quote Link to comment Share on other sites More sharing options...
lintujuh Posted July 30, 2017 Author Report Share Posted July 30, 2017 Daminion also allows you to link or group associated items together, but there are no built-in tools for checking the consistency of the meta data for the linked or grouped items. DamScan.py solves this problem and reports inconsistencies in metadata for Daminion server and standalone catalogs. Great thanks to Wilfried and Uwe for testing my program and commenting the documentation. Here is what I can call the first official version of the program. You need to rename DamScan.txt to DamScan.py after downloading. See detailed instructions and options in the manual. The program is also available from GitHub. -Juha DamScan.txt DamScan.pdf 1 Quote Link to comment Share on other sites More sharing options...
lintujuh Posted August 2, 2017 Author Report Share Posted August 2, 2017 Hi! When importing the output file into Excel, you have to select in import wizard at Step 1 File origin: 65001 : Unicode (UTF-8). This will import the accented and diacritic letters correctly. There is also an updated version in Github. It doesn't contain any new features, just bug fixes to few exceptional cases. -Juha Quote Link to comment Share on other sites More sharing options...
lintujuh Posted August 10, 2017 Author Report Share Posted August 10, 2017 Hi! A new release for those who have been using the tool and who is doesn't want "valid" differences between grouped or linked items to be reported. Quote from the documentation The option -a specifies a configuration file that contains acknowledged differences between linked or grouped media items. The differences listed in this file are excluded from the output. The new version and the updated documentation are available from Github. -Juha Quote Link to comment Share on other sites More sharing options...
lintujuh Posted September 8, 2017 Author Report Share Posted September 8, 2017 Hi! A new version of the tool is available in GitHub. Now it's possible to save the parameters in an INI file. Collections can now also be compared and then there are some bug fixes and performance improvements. -Juha Quote Link to comment Share on other sites More sharing options...
lintujuh Posted September 27, 2017 Author Report Share Posted September 27, 2017 Hi, Added support for "Title", "Description" and "Comments". The latest version and documentation is available in GitHub. -Juha Quote Link to comment Share on other sites More sharing options...
lintujuh Posted October 25, 2017 Author Report Share Posted October 25, 2017 New version available in GitHub. In this version it's possible to specify distance between geolocations that will be regarded equal. See also this thread. -Juha 1 Quote Link to comment Share on other sites More sharing options...
Egon Posted November 14, 2019 Report Share Posted November 14, 2019 Juha, I tested your DamCompare.py (download from GitHub) with the following command python DamCompare.py -l -c1 oldcat.dmc -c2 newcat.dmc. The new catalog I have previously created. In the result, all files were missing. What did I do wrong ? DSCF2665.jpg <> – ERROR: file missing Quote Link to comment Share on other sites More sharing options...
lintujuh Posted November 17, 2019 Author Report Share Posted November 17, 2019 Hi Egon and sorry for the delay. The error message says that it finds DSCF2665.jpg in oldcat, but doesn't find it in newcat. You can try to add flag -f (means print full path) to your command line, if that gives you any hint. The program matches images in the catalogs with the full directory path and filename. Ask more, if this doesn't help. -Juha Quote Link to comment Share on other sites More sharing options...
Egon Posted November 17, 2019 Report Share Posted November 17, 2019 Hello Juha. thank you for your answer. I added -f flag.. The result was the same. (except, before the file name was the path)All pictures would not be available in the new catalog. I understood that your script imported the pictures into the new, empty catalog. Is that correct? Egon Quote Link to comment Share on other sites More sharing options...
WilfriedB Posted November 17, 2019 Report Share Posted November 17, 2019 12 minutes ago, Egon said: I understood that your script imported the pictures into the new, empty catalog. Is that correct? No, this is not correct, Egon. You need to create an empty catalog (correct), but you also should import all pictures there before running the script. Quote Link to comment Share on other sites More sharing options...
lintujuh Posted November 17, 2019 Author Report Share Posted November 17, 2019 Thank you Wilfried, Exactly as Wilfried said, the tool just compares two existing catalogs with each other, it doesn't import any images. Juha Quote Link to comment Share on other sites More sharing options...
Egon Posted November 18, 2019 Report Share Posted November 18, 2019 Thank you Wilfried and Juha, I have exported all the files of a catalog to csv, created a new catalog and imported the csv file.Then I executed the following command:python DamCompare.py -f -l -c1 "C:\Users\wir\Documents\Daminion Catalogs\DaminionCatalog_ganz neu.dmc" -c2 "C:\Users\wir\Documents\Daminion Catalogs\CatNeu.dmc" -o out .txtThere were many errors related to the import.But a mistake has in my view nothing to do with the import.Many files recognize the creation date as incorrect.But it is identical to the old catalog in the new catalog. Errormessage: D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg <> D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg Creation Time csv file: FilePath;Item Id;Rating;Label;People;Place;Event;Categories;Keywords;Project;Client;Authors;Copyrights;Scene;Subject Code;Intellectual Genre;Creation Datetime;... ... D:\eigene Bilder\2004\Ausflüge\Wien\DSCN0020.jpg;1100;;;;Österreich\Wien\\Wien;;Autos\C, Autos\C\Chrysler, Autos\C\Chrysler\Crossfire, Autos;;Chrysler Crossfire;;;;;;;18.12.2009 17:48:43;... ... What is the real difference? Greetings Egon Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.