(Q):
Can the test dataset contain new labels for either poly, point and line? New labels which are not in training and validation.
(A):
The style and types of features in the training and validation sets are representative of evaluation set. New labels will be provided for the evaluation maps.
(Q):
What is a _pt_poly.tif type? Are these two separate labels or just one?
(A):
The final character string after the last underscore indicates the feature type (_pt, _line, or _poly).
(Q):
For the point prediction output should the output value be 1 or 255?
(A):
For any feature, the output should be a binary raster (black and white image), where only the pixels that contain the feature being extracted are encoded as “1” for feature present and all other pixels are coded “0” for feature absent. Match the format provided in the training data, which is a single band 8-bit unsigned .tif with values of 0 or 1.
(Q):
Maps ending in "mosaic" have 5 point features for detection -- can we assume that the same 5 features will always be in the same order (i.e., _pt1 = "cross shape", _pt2 = "box shape", etc.)?
(A):
All the training maps ending in mosaic have the same legend and the five features are labeled similarly. You should not rely on this relationship holding true across other maps, with or without “mosaic” in the filename.
(Q):
Can you explain the binary rasters for the overlapping features? It was somehow explained in the Map_feature_Extraction_Challange_Details.pdf but I still am not sure if I understand it correctly.
(A):
In general, the solution to this challenge attempts to mimic what a human would do when digitizing the map. For example, when a bedrock polygon is obscured by a water feature, such as a lake or reservoir, the bedrock feature is assumed to be continuous beneath the water. In this case, the color of the water (blue) may not match the legend color for the bedrock polygon. Similarly, a thin surficial deposit of stream sediment (e.g., Qal) may obscure a bedrock polygon.
(Q):
Will submissions be inspected for differences in format, or de-bugging of silly mistakes?
(A):
No feedback on formatting or attempt at de-bugging of submissions will be made. Please use validation rounds to help discover mistakes and verify that your output will be evaluated correctly.
(Q):
The legends that will be in the validation and test, are they all included in the training set?
(A):
No, each map should be thought of containing a unique set of features that pertain only to that map. The feature identified in the legend of one map may not correspond to the same feature on another map. The color matching should only be done in a single map.
(Q):
Is there a reference for how it is usually done?
(A):
I am not aware from an existing solution for this problem aside from the baseline provided. I encourage you to look at the literature or approaches used in other fields, such as medical imaging or remote sensing fields. There might be existing models that could be productive here.
(Q):
Are there universal symbols?
(A):
No, but there are similarities. For the most part, the maps are USGS products. The USGS tries to use consistent symbology across products, but over 100+ years, there has been evolution in that symbology. Some look the same, but there are exceptions.
(Q):
Do you have to use open CV?
(A):
No, you are not required to use any specific software package or library. The validation script uses open CV, so you will need to use it to run it yourself. If you want, you can create a separate environment to run the validation script to not interfere with yours.