Quantcast
Channel: Active questions tagged header - Stack Overflow
Viewing all articles
Browse latest Browse all 651

Issue with renaming/selecting columns in pyspark

$
0
0

I have an excel file that I'm reading into databricks using pyspark. The data has extra columns at the end that I do not want included. I use the following code to accomplish this:

data_object = spark.read.format("com.crealytics.spark.excel") \  .option("header", "true") \  .option("inferSchema", "true") \  .option("dataAddress", f"'1. ITEM'!A1") \  .load("path/to/file")data_object = data_object.select(data_object.columns[0:21])

It then errors on the last line with the following:

AnalysisException: Column '`ITEM NUMBER

The entirety of the first column header is as follows:

'ITEM NUMBER\nMandatory Field\nFor Formula Calc. Only'

So, it appears that the line break is causing an issue, but if I attempt to perform a replace on all of the \n in the header row, I get the same error as above.

The ultimate goal is to rename the column headers to match the database using withColumnRenamed which does work. I also tried to then remove the extra columns (as opposed to right after reading the file like in the code above), but due to one of the extra columns having the same name as another column in the dataframe there is an ambiguity issue instead.


Viewing all articles
Browse latest Browse all 651

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>