Jump to content

Flagging Images with Non-Compliant Metadata


Paul Barrett
 Share

Recommended Posts

I have had an idea which I would like to run past you all for a sanity check before I post it as a feature request.

I frequently check the Saved Search "Unsynced" counter to make sure it's 0

 

But I need more. I don't know whether this applies to others but I spend a lot of time checking and rechecking the 'compliance' of my image catalogs to make sure that published images appear in the correct place in my published library and with teh correct metadata.

 

I suspect that for each of us there is a set of rules that can be applied to each catalog. Mine would be something like this for my photo catalog (I would need a different set for my documents library, hence why I say per catalog):

  • Cropped
  • Cleaned
  • Correct Creation Date Time
  • Minimum keyword tags:
    • Who
    • When
    • Where

    [*]Geotagged[*]Located in correct date stamped folder

I was wondering whether there would be any merit in having some form of system based compliance checking per image - a mini checklist you could open where you could tick the boxes to verify what has yet to be done on an image. Some of the boxes could be checked by the system e.g is an item geotagged, have the minimum keywords standards been met, and is the date stamp compatible with the folder name. Others would be manually checked either individually or for a selected group of images.

 

You could look at this as a set of mandatory fields to be completed. But I don't want to be in a position where I cannot save an image's properties because of a missing mandatory field, because I can't always determine all of the mandatory information in one go.

 

Some configuration would be necessary:

 

Enable compliance rules checkbox

 

If checked:

  • List out all the available properties and tags (there are lots, so display them in expandable groups) and allow the user to tag which ones they want to include.
  • In the case of tags it would need to operate at subtag level and allow multiples to be selected
  • Free text fields would allow for the checking of non system issues such as cropping and cleaning and allow the user to define the responses. e.g. Check Box or Yes / No / n/a dropdown
  • User should be able to change the order in which the selected fields are presented to the user, to match their preferred sequence.

 

Once that has been set up it could be presented as:

 

1. A dockable Compliance Panel:

  1. Contains all the selected fields in their preferred order
  2. With a traffic light symbol indicating green when all complete, red when not.
  3. When multiple images are selected the individual values would be suppressed (unless they were identical) but could be edited and applied to the batch, just like Properties can be today.
  4. Where none of an individual tag is complete in a selected group of images a red light would appear against the individual value. When one or more image has that value entered the light is orange. If all complete, light is green.

 

Regardless of the preferred order, values can be entered in any sequence

 

2. A traffic light system on the thumbnail.

Red when not all complete, green when complete.

 

3. A new Saved Search "Non-compliant" which would enable us to find and correct images quickly.

So, what do you think. Is there any merit in this?

Regards

- Paul

Link to comment
Share on other sites

Hello Paul,

If I don't have the time to enter all the metadata I have to assign I use a TagPreset to assign a keyword "to_do" to the items.

In your case it can be the following hierarchy:

ToDo.jpg

As long as there are the numbers > 0 you have to do something.

This can be a first and easy way to het an overview of the "to do list"

If you have assigned the final tags, you can delete the "temporary" to-do tag/subtag.

Regards, Uwe

Link to comment
Share on other sites

Hello Paul,

If I don't have the time to enter all the metadata I have to assign I use a TagPreset to assign a keyword "to_do" to the items.

In your case it can be the following hierarchy:

post-846-0-34728400-1488367518_thumb.jpg

As long as there are the numbers > 0 you have to do something.

This can be a first and easy way to het an overview of the "to do list"

If you have assigned the final tags, you can delete the "temporary" to-do tag/subtag.

Regards, Uwe

 

Nice idea. I have been playing around with it to see how it might work.

 

I have to assign a custom tag instead of a keyword because all keywords get read and displayed by Photo Station. But that's OK a custom tag works just as well as a keyword.

 

But the fundamental issue with it is exactly the same as with the metadata fields themselves, which is that they all depend, for their accuracy, on a fallible weak point - the operator - me. I try my best to make sure that all metadata is present and correct but I have added 2,000 images in the last 3 weeks. Each one has 9 data points to check so that's 18,000 actions. Did I remember to do them all? Unlikely.

 

So, I need to the system to apply a set of logical rules and prompt me to do the things that I omitted. It can do it at the speed of electrons and consistently, unlike this senior citizen.

 

Plus I need some protection for system generated issues, In the process of testing out your idea I came across a whole cluster of images with a year date of 1. Not 1901 or 2001 but 1. Look:

 

post-3427-0-95419300-1488376272_thumb.png

 

 

I thought it might be a display issue but the same date is in the metadata, although here it shown as 0001:

 

post-3427-0-16844800-1488376326_thumb.png

 

 

Now, I KNOW I edited the dates on those images from the date scanned because I scanned them in the last few weeks. and I also know I didn't put a year of 0001. You can't. I just tried and couldn't find a way to do it. If you enter 0001 the system corrects it to 2001. I have seen this before - there's a quirk in the app that reacts badly to something I do when entering / saving dates. But if, as others have said, that date handling has been unchanged since forever, I need some validation. My folder name to image date rule would capture this.

 

 

 

- Paul

Link to comment
Share on other sites

Hello Paul,

I understand what you want to do. This was one of the biggest part of my daily work: to convinced a customer that the results of the reporting depends on the quality of data.

In your case you want to have:

  • logical checks: e.g. does this keyword tag fits to another tag or to the situation on the photo, etc.?
  • "physical" checks: did I enter a tag?

This means there has to be a network of "rules" and "conditions" in the background. This network of customizing and programs has to be developed - who should do it?

I guess, also compared to the focus of Daminion and the their potential user group: It will be our manual work to guarantee the quality of our data.

Regards, Uwe

Link to comment
Share on other sites

Hi Paul,

 

When Uwe mentioned programming, that gave me an idea. You cannot program Daminion backend, but you could program Excel.

 

Select an item and select from menu Item > Export > Export to CSV ... In the export dialog you can select which tags you want to export and do you want to export only the selected image(s) or your full catalog. When you have the CSV, open it in Excel use conditional formatting to highlight discrepancies or you define some more complex formulas to additional columns that would then show True or False if your rules are met or not.

 

-Juha

Link to comment
Share on other sites

Hi

 

You have also available

  • Catalog > Analyze ...

 

Unfortunately this is no use to me at all - see image:

 

post-3427-0-53709600-1488378673_thumb.png

 

 

  • Unassigned in the Tag categories

 

Unfortunately this one is not very useful either. All my categorisation has to be done using keywords and as soon as you add one keyword to one image the unassigned count drops by one, even if you still have other tags to add to the same image The unassigned count is only useful if it's a 1 category 1 tag 1 image situation which in my case it never is.

 

  • Saved search > No GPS Coordinates

 

Now that one I DO use

 

These are not that detailed as Uwe's approach, but gives you an indication.

 

-Juha

 

I think you have made the point very well though Juha. Data integrity and compliance is dependent on tools that are spread around the system, not always helpful and require the user to remember to keep checking them. There has to be a better way surely?

 

- Paul

Link to comment
Share on other sites

Hi Paul,

 

When Uwe mentioned programming, that gave me an idea. You cannot program Daminion backend, but you could program Excel.

 

Select an item and select from menu Item > Export > Export to CSV ... In the export dialog you can select which tags you want to export and do you want to export only the selected image(s) or your full catalog. When you have the CSV, open it in Excel use conditional formatting to highlight discrepancies or you define some more complex formulas to additional columns that would then show True or False if your rules are met or not.

 

-Juha

 

Nice one Juha. Yes I could but I haven't programmed in Excel for 4 years now and i feel SO much better now. The migraines have stopped. I'm sleeping better...... drinks.gif

 

- Paul

 

 

Link to comment
Share on other sites

Hi Paul,

 

I didn't remember that you could only use Keywords and not the other tag categories.

 

Also it makes the Excel programming much more difficult. In the CSV all the keywords are comma separated in a single column. Working with that will certainly give you headache and sleepless nights. :help:

 

-Juha

Link to comment
Share on other sites

Hello Paul,

I understand what you want to do. This was one of the biggest part of my daily work: to convinced a customer that the results of the reporting depends on the quality of data.

In your case you want to have:

  • logical checks: e.g. does this keyword tag fits to another tag or to the situation on the photo, etc.?
  • "physical" checks: did I enter a tag?

This means there has to be a network of "rules" and "conditions" in the background. This network of customizing and programs has to be developed - who should do it?

I guess, also compared to the focus of Daminion and the their potential user group: It will be our manual work to guarantee the quality of our data.

Regards, Uwe

 

I would think that given Daminion's current focus this is exactly the sort of thing that would appeal to their customers. The last thing I wanted as an employer was poor data quality. The next last thing i wanted was people spending a lot of valuable time to achieve such a poor level of data quality and having to spend even more time looking for and cleaning it up.

 

So a vendor that said 'We can help you fix that' would have been welcomed with open arms. What a USP that would be.

 

I don't think the rules are that complex. Instead of having mandatory fields so that you cannot save a record until you have completed them (which would result in an even worse case - junk data being entered) the system recognises which fields have 'Required' Tags against them and uses that to flag them up. So if I check the the keyword sub-tag 'People' as required then until at least one subtag beneath People is entered the image is non compliant, And so on for all the other tags I have checked.

 

And to anticipate the next question, a selection of 'None' would be acceptable if there were no people in the shot (although I'd prefer that value was not written to the metadata - it's a catalog only value)

 

The tags don't have co-dependencies in my view of the solution. That really would add complication.

 

The most difficult one is comparing Creation Datetime to folder names and I, for one, would be happy to restructure my folders if it would help the cause. If your folder structure isn't date related then you just don't have access to that feature.

 

- Paul

Link to comment
Share on other sites

Hello Paul,

maybe you have to go step by step to check all of the existing options/combinations available at the moment in the Advances Search.

To check DateTime maybe this can be an option to get a draft output if there is an mismatch between the folder and the DateTimeCreation.

Daminion_saved_search.jpg

Regards, Uwe

Link to comment
Share on other sites

Hello Paul,

maybe you have to go step by step to check all of the existing options/combinations available at the moment in the Advances Search.

To check DateTime maybe this can be an option to get a draft output if there is an mismatch between the folder and the DateTimeCreation.

post-846-0-31826200-1488388133_thumb.jpg

Regards, Uwe

 

Nice idea, That would certainly work in theory. But each year requires three arguments and I have to handle 113 years so that's 339 arguments in total. Will the query engine handle that or do I need to run 113 separate queries? The latter would make me lose the will to live! :)

 

That still leaves all the other stuff to be checked manually and that is still open to human error.

 

I will go play wit the the search generator.

 

- Paul

 

 

 

Link to comment
Share on other sites

I am playing with the search generator.

 

help.gif

Help! Please tell me there is an easier way to enter a date starting 1st January 1904 than scrolling through the calendar date picker?

 

I tried overwriting the date value but it wouldn't let me.

 

It's taken me a minute just to scroll back to 2005 (12 years) so that's another eight minutes to go. 9 minutes to set the before date and another 9 to do the after date. 18 minutes. So, nearly 20 minutes to set up each query. That will reduce as the dates get nearer but across 113 years at an average of 10 minutes that's 19 hours work!! I don't think so.

 

And I can see there is no way of appending a new base folder value so I will need 113 separate queries.

 

Also the query results will scroll way off screen and be invisible, unless there's a way of suppressing results which are zero?

 

Man this is hard. Fun, but hard. :)

yahoo.gif

 

Paul

Link to comment
Share on other sites

Hello Paul,

make double clicks on the header of the calendar.

post-846-0-28211300-1488392042_thumb.jpg

Regards, Uwe

 

Yay - but how the **** did you know to do that?dash2.gif

 

I thought I had found a cool way too. If you click on the year value you can use the cursor up and down arrows to scroll through the year at high speed. Woo Hoo!

 

But, when I tried to do the same thing on the year value in the after argument it would not respond.

 

I have sent Daria a screen recording of the issue. mega_shok.gif

 

- Paul

 

 

Link to comment
Share on other sites

Hello Paul,

I hope you don't get the migraine back now and can still sleep very well when you try the following:

Start the pgadmin (it's part of the postgreSQL installation), connect to your database e.g. "netcatalog", enter the password "postgres" and select in the menu->Tools->Query Tool.

Now you can copy the query statements below into the empty window. Press F5 and ...wait until you should get the result.

This query checks if the first 4 digits (in my case the year) of the file name (YYYYMMDD_HHMMSS) is not equal to the Year in the CreationDateTime.

This is just a first draft and the result of my first steps to do something with SQL statements and PostgreSQL today.

Good luck - waiting for your answer.

Regards, Uwe

 

SELECT mediaitems.id, mediaitems.filename, mediaitems.creationdatetime, files.relativepath

FROM mediaitems INNER JOIN files ON mediaitems.id = files.id

WHERE (substring(mediaitems.filename FROM 1 FOR 4)) (substring(to_char(mediaitems.creationdatetime, 'YYYY-DD-MM HH:MI:SS') FROM 1 FOR 4))

;

Link to comment
Share on other sites

Hello Paul,

I hope you don't get the migraine back now and can still sleep very well when you try the following:

Start the pgadmin (it's part of the postgreSQL installation), connect to your database e.g. "netcatalog", enter the password "postgres" and select in the menu->Tools->Query Tool.

Now you can copy the query statements below into the empty window. Press F5 and ...wait until you should get the result.

This query checks if the first 4 digits (in my case the year) of the file name (YYYYMMDD_HHMMSS) is not equal to the Year in the CreationDateTime.

This is just a first draft and the result of my first steps to do something with SQL statements and PostgreSQL today.

Good luck - waiting for your answer.

Regards, Uwe

 

SELECT mediaitems.id, mediaitems.filename, mediaitems.creationdatetime, files.relativepath

FROM mediaitems INNER JOIN files ON mediaitems.id = files.id

WHERE (substring(mediaitems.filename FROM 1 FOR 4)) <> (substring(to_char(mediaitems.creationdatetime, 'YYYY-DD-MM HH:MI:SS') FROM 1 FOR 4))

;

 

Hi Uwe

 

You are THE MAN!

 

This looks fantastic. I am busy on some other stuff for a couple of days but I am definitely trying this out when I get some time to spare.

 

Thanks so much.

 

- Paul

 

 

Link to comment
Share on other sites

Hello,

some improvements.


     
  • Because deleted/removed mediaitems remain in the catalog/database (but marked as deleted in the DB) until the catalog is optimized one has to exclude these records of the DB.
  • Output sorted by the folder and then by the file name.

 

Regards, Uwe

 

SELECT mediaitems.id, mediaitems.filename, mediaitems.creationdatetime, files.relativepath

FROM mediaitems INNER JOIN files ON mediaitems.id = files.id

WHERE (substring(mediaitems.filename FROM 1 FOR 4)) (substring(to_char(mediaitems.creationdatetime, 'YYYY-DD-MM HH:MI:SS') FROM 1 FOR 4))

AND mediaitems.deleted = false

ORDER BY files.relativepath ASC, mediaitems.filename ASC;

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...